The present invention relates generally to the field of video-camera systems,
such as a video conferencing systems, and more particularly to video camera targeting
systems that locate and acquire targets using an input characterizing a target
and a machine-classification system to assist in target acquisition responsively
to that input. In some embodiments, the characterization and classification are
employed together with one or more inputs of other modalities such as gesture-control.
In one example of the system in operation, an operator is able to make pointing
gestures toward an object and, simultaneously speak a sentence identifying the
object to which the speaker is pointing. At least one term of the sentence, presumably,
is associated with a machine-sensible characteristic by which the object can be
identified. The system captures and processes the voice and gesture inputs and
re-positions a PTZ video camera to focus on the object that best matches both the
characteristics and the gesture. Thus, the PTZ camera is aimed based upon the inputs
the system receives and the system's ability to locate the target by its sensors.