The present invention is directed toward a system and process that controls a
group of networked electronic components using a multimodal integration scheme
in which inputs from a speech recognition subsystem, gesture recognition subsystem
employing a wireless pointing device and pointing analysis subsystem also employing
the pointing device, are combined to determine what component a user wants to control
and what control action is desired. In this multimodal integration scheme, the
desired action concerning an electronic component is decomposed into a command
and a referent pair. The referent can be identified using the pointing device to
identify the component by pointing at the component or an object associated with
it, by using speech recognition, or both. The command may be specified by pressing
a button on the pointing device, by a gesture performed with the pointing device,
by a speech recognition event, or by any combination of these inputs.