A speech synthesizer is provided that concatenates stored samples of
speech units without modifying the prosody of the samples. The present
invention is able to achieve a high level of naturalness in synthesized
speech with a carefully designed training speech corpus by storing
samples based on the prosodic and phonetic context in which they occur.
In particular, some embodiments of the present invention limit the
training text to those sentences that will produce the most frequent sets
of prosodic contexts for each speech unit. Further embodiments of the
present invention also provide a multi-tier selection mechanism for
selecting a set of samples that will produce the most natural sounding
speech.