A method for distance definition in a text-to-speech conversion system by
applying Gaussian Mixture Model (GMM) to a distance definition. According
to an embodiment, the text that is to be subjected to text-to-speech
conversion is analyzed to obtain a text with descriptive prosody
annotation; clustering is performed for samples in the obtained text; and
a GMM model is generated for each cluster, to determine the distance
between the sample and the corresponding GMM model.