A speech processing apparatus according to an embodiment of the invention
includes a conversion-source-speaker speech-unit database; a
voice-conversion-rule-learning-data generating means; and a
voice-conversion-rule learning means, with which it makes voice
conversion rules. The voice-conversion-rule-learning-data generating
means includes a conversion-target-speaker speech-unit extracting means;
an attribute-information generating means; a conversion-source-speaker
speech-unit database; and a conversion-source-speaker speech-unit
selection means. The conversion-source-speaker speech-unit selection
means selects conversion-source-speaker speech units corresponding to
conversion-target-speaker speech units based on the mismatch between the
attribute information of the conversion-target-speaker speech units and
that of the conversion-source-speaker speech units, whereby the voice
conversion rules are made from the selected pair of the
conversion-target-speaker speech units and the conversion-source-speaker
speech units.