A sentence or a singing is to be synthesized with a natural speech close
to the human voice. To this end, singing metrical data are formed in a
tag processing unit 211 in a singing synthesis unit 212 in a speech
synthesis apparatus 200 based on singing data and an analyzed text
portion. A language analysis unit 213 performs language processing on
text portions other than the singing data. As for a text portion
registered in a natural metrical dictionary, as determined by this
language processing, corresponding natural metrical data is selected and
its parameters are adjusted in a metrical data adjustment unit 222 based
on phonemic segment data of a phonemic segment storage unit 223 in the
metrical data adjustment unit 222. As for a text portion not registered
in the natural metrical dictionary, a phonemic symbol string is generated
in a natural metrical dictionary storage unit 214, after which metrical
data are generated in a metrical generating unit 221. A waveform
generating unit 224 concatenates necessary phonemic segment data, based
on the natural metrical data, metrical data and the singing metrical data
to generate speech waveform data.