A speech reference enrollment method involves requesting a user speak a
word; detecting a first utterance; requesting the user speak the word;
detecting a second utterance; determining a first similarity between the
first utterance and the second utterance; when the first similarity is
less than a predetermined similarity, requesting the user speak the word;
detecting a third utterance; determining a second similarity between the
first utterance and the third utterance; and when the second similarity
is greater than or equal to the predetermined similarity, creating a
reference.