A structured generative model of a speech coarticulation and reduction is
described with a novel two-stage implementation. At the first stage, the
dynamics of formants or vocal tract resonance (VTR) are generated using
prior information of resonance targets in the phone sequence.
Bi-directional temporal filtering with finite impulse response (FIR) is
applied to the segmental target sequence as the FIR filter's input. At
the second stage the dynamics of speech cepstra are predicted
analytically based on the FIR filtered VTR targets. The combined system
of these two stages thus generates correlated and causally related VTR
and cepstral dynamics where phonetic reduction is represented explicitly
in the hidden resonance space and implicitly in the observed cepstral
space. The combined system also gives the acoustic observation
probability given a phone sequence. Using this probability, different
phone sequences can be compared and ranked in terms of their respective
probability values. This then permits the use of the model for phonetic
recognition.