A speech recognition method includes use of synchronous or asynchronous
audio and a video data to enhance speech recognition probabilities. A two
stream factorial hidden Markov model is trained and used to identify
speech. At least one stream is derived from audio data and a second
stream is derived from mouth pattern data. Gestural or other suitable
data streams can optionally be combined to reduce speech recognition
error rates in noisy environments.