Two statistics are disclosed for determining the quality of language
models. These statistics are called acoustic perplexity and the synthetic
acoustic word error rate (SAWER), and they depend upon methods for
computing the acoustic confusability of words. It is possible to
substitute models of acoustic data in place of real acoustic data in
order to determine acoustic confusability. An evaluation model is
created, a synthesizer model is created, and a matrix is determined from
the evaluation and synthesizer models. Each of the evaluation and
synthesizer models is a hidden Markov model. Once the matrix is
determined, a confusability calculation may be performed. Different
methods are used to determine synthetic likelihoods. The confusability
may be normalized and smoothed and methods are disclosed that increase
the speed of performing the matrix inversion and the confusability
calculation. A method for caching and reusing computations for similar
words is disclosed. Acoustic perplexity and SAWER are determined and
applied.