Input text data undergoes language analysis to generate prosody, and a speech
database is searched for a synthesis unit on the basis of the prosody. A modification
distortion of the found synthesis unit, and concatenation distortions upon connecting
that synthesis unit to those in the preceding phoneme are computed, and a distortion
determination unit weights the modification and concatenation distortions to determine
the total distortion. An Nbest determination unit obtains N best paths that can
minimize the distortion using the A* search algorithm, and a registration unit
determination unit selects a synthesis unit to be registered in a synthesis unit
inventory on the basis of the N best paths in the order of frequencies of occurrence,
and registers it in the synthesis unit inventory.