In accordance with a present invention speech recognition is disclosed. It
uses a microphone to receive audible sounds input by a user into a first
computing device having a program with a database consisting of (i)
digital representations of known audible sounds and associated
alphanumeric representations of the known audible sounds and (ii) digital
representations of known audible sounds corresponding to
mispronunciations resulting from known classes of mispronounced words and
phrases. The method is performed by receiving the audible sounds in the
form of the electrical output of the microphone. A particular audible
sound to be recognized is converted into a digital representation of the
audible sound. The digital representation of the particular audible sound
is then compared to the digital representations of the known audible
sounds to determine which of those known audible sounds is most likely to
be the particular audible sound being compared to the sounds in the
database. A speech recognition output consisting of the alphanumeric
representation associated with the audible sound most likely to be the
particular audible sound is then produced. An error indication is then
received from the user indicating that there is an error in recognition.
The user also indicates the proper alphanumeric representation of the
particular audible sound. This allows assistant to determine whether the
error is a result of a known type or instance of mispronunciation. In
response to a determination of error corresponding to a known type or
instance of mispronunciation, the system presents an interactive training
program from the computer to the user to enable the user to correct such
mispronunciation.