The present invention comprises a methodology for implementing a vocabulary set
for use in a speech recognition system, and may preferably include a recognizer
for analyzing utterances from the vocabulary set to generate N-best lists of recognition
candidates. The N-best lists may then be utilized to create an acoustical matrix
configured to relate said utterances to top recognition candidates from said N-best
lists, as well as a lexical matrix configured to relate the utterances to the top
recognition candidates from the N-best lists only when second-highest recognition
candidates from the N-best lists are correct recognition results. An utterance
ranking may then preferably be created according to composite individual error/accuracy
values for each of the utterances. The composite individual error/accuracy values
may preferably be derived from both the acoustical matrix and the lexical matrix.
Lowest-ranked utterances from the foregoing utterance ranking may preferably be
repeatedly eliminated from the vocabulary set when a total error/accuracy value
for all of the utterances fails to exceed a predetermined threshold value.