A novel method for synchronizing the lips of a sketched face to an input
voice. The lip synchronization system and method approach is to use
training video as much as possible when the input voice is similar to the
training voice sequences. Initially, face sequences are clustered from
video segments, then by making use of sub-sequence Hidden Markov Models,
a correlation between speech signals and face shape sequences is built.
From this re-use of video, the discontinuity between two consecutive
output faces is decreased and accurate and realistic synthesized
animations are obtained. The lip synchronization system and method can
synthesize faces from input audio in real-time without noticeable delay.
Since acoustic feature data calculated from audio is directly used to
drive the system without considering its phonemic representation, the
method can adapt to any kind of voice, language or sound.