An arrangement is provided for text to speech processing based on linguistic
prosodic
models. Linguistic prosodic models are established to characterize different linguistic
prosodic characteristics. When an input text is received, a target unit sequence
is generated with a linguistic target that annotates target units in the target
unit sequence with a plurality of linguistic prosodic characteristics so that speech
synthesized in accordance with the target unit sequence and the linguistic target
has certain desired prosodic properties. A unit sequence is selected in accordance
with the target unit sequence and the linguistic target based on joint cost information
evaluated using established linguistic prosodic models. The selected unit sequence
is used to produce synthesized speech corresponding to the input text.