The emotion is to be added to the synthesized speech as the prosodic
feature of the language is maintained. In a speech synthesis device 200,
a language processor 201 generates a string of pronunciation marks from
the text, and a prosodic data generating unit 202 creates prosodic data,
expressing the time duration, pitch, sound volume or the like parameters
of phonemes, based on the string of pronunciation marks. A constraint
information generating unit 203 is fed with the prosodic data and with
the string of pronunciation marks to generate the constraint information
which limits the changes in the parameters to add the so generated
constraint information to the prosodic data. A emotion filter 204, fed
with the prosodic data, to which has been added the constraint
information, changes the parameters of the prosodic data, within the
constraint, responsive to the feeling state information, imparted to it.
A waveform generating unit 205 synthesizes the speech waveform based on
the prosodic data the parameters of which have been changed.