According to an embodiment, an apparatus and method are disclosed for
dynamic gesture recognition from stereo sequences. In an embodiment, a
stereo sequence of images of a subject is obtained and a depth disparity
map is generated from the stereo sequence. The system is initiated
automatically based upon a statistical model of the upper body of the
subject. The upper body of the subject is modeled as three planes,
representing the torso and arms of the subject, and three Gaussian
components, representing the head and hands of the subject. The system
tracks the upper body of the subject using the statistical upper body
model and extracts three-dimensional features of the gestures performed.
The system recognizes the gestures using recognition units, which, under
a particular embodiment, utilizes hidden Markov models for the
three-dimensional gestures.