A speech synthesizer is provided that concatenates stored samples of speech units
without modifying the prosody of the samples. The present invention is able to
achieve a high level of naturalness in synthesized speech with a carefully designed
training speech corpus by storing samples based on the prosodic and phonetic context
in which they occur. In particular, some embodiments of the present invention limit
the training text to those sentences that will produce the most frequent sets of
prosodic contexts for each speech unit. Further embodiments of the present invention
also provide a multi-tier selection mechanism for selecting a set of samples that
will produce the most natural sounding speech.