Multimodal utterances contain a number of different modes. These modes can
include speech, gestures, and pen, haptic, and gaze inputs, and the like.
This invention use recognition results from one or more of these modes to
provide compensation to the recognition process of one or more other ones
of these modes. In various exemplary embodiments, a multimodal
recognition system inputs one or more recognition lattices from one or
more of these modes, and generates one or more models to be used by one
or more mode recognizers to recognize the one or more other modes. In one
exemplary embodiment, a gesture recognizer inputs a gesture input and
outputs a gesture recognition lattice to a multimodal parser. The
multimodal parser generates a language model and outputs it to an
automatic speech recognition system, which uses the received language
model to recognize the speech input that corresponds to the recognized
gesture input.