Architecture for implementing a perceptual user interface. The
architecture comprises alternative modalities for controlling computer
application programs and manipulating on-screen objects through hand
gestures or a combination of hand gestures and verbal commands. The
perceptual user interface system includes a tracking component that
detects object characteristics of at least one of a plurality of objects
within a scene, and tracks the respective object. Detection of object
characteristics is based at least in part upon image comparison of a
plurality of images relative to a course mapping of the images. A seeding
component iteratively seeds the tracking component with object hypotheses
based upon the presence of the object characteristics and the image
comparison. A filtering component selectively removes the tracked object
from the object hypotheses and/or at least one object hypothesis from the
set of object hypotheses based upon predetermined removal criteria.