A text-to-speech conversion system that includes a first module to convert text
into words, a second module to convert words into phonemes, a third module to map
phonemes to sound units, and a storage unit to store speech representations for
a library of sound units. The first, second, and third modules and the storage
unit are implemented within a single integrated circuit to reduce size and cost.
The system typically further includes a ROM to store the codes for the modules,
a RAM to store the text and intermediate results, a processor to execute the codes
for the modules, a control module to direct the operation of the first, second,
and third modules. The storage unit may be implemented with a multi-level, non-volatile
analog storage array and may be programmed with a new library of speech representations
by a programming module.