A method and computer-readable medium are provided that determine
predicted acoustic values for a sequence of hypothesized speech units
using modeled articulatory or VTR dynamics values and using the modeled
relationship between the articulatory (or VTR) and acoustic values for
the same speech events. Under one embodiment, the articulatory (or VTR)
dynamics value depends on articulatory dynamics values at pervious time
frames and articulation targets. In another embodiment, the articulatory
dynamics value depends in part on an acoustic environment value such as
noise or distortion. In a third embodiment, a time constant that defines
the articulatory dynamics value is trained using a variety of
articulation styles. By modeling the articulatory or VTR dynamics value
in these manners, hyper-articulated, hypo-articulated, fast, and slow
speech can be better recognized and the requirement for the training data
can be reduced.