An "Animation Synthesizer" uses trainable probabilistic models, such as
Hidden Markov Models (HMM), Artificial Neural Networks (ANN), etc., to
provide speech and text driven body animation synthesis. Probabilistic
models are trained using synchronized motion and speech inputs (e.g.,
live or recorded audio/video feeds) at various speech levels, such as
sentences, phrases, words, phonemes, sub-phonemes, etc., depending upon
the available data, and the motion type or body part being modeled. The
Animation Synthesizer then uses the trainable probabilistic model for
selecting animation trajectories for one or more different body parts
(e.g., face, head, hands, arms, etc.) based on an arbitrary text and/or
speech input. These animation trajectories are then used to synthesize a
sequence of animations for digital avatars, cartoon characters, computer
generated anthropomorphic persons or creatures, actual motions for
physical robots, etc., that are synchronized with a speech output
corresponding to the text and/or speech input.