A computer-based detection (e.g., speech recognition) system combines a word
decoder
and subword decoder to detect words (or phrases) in a spoken input provided by
a user into a speaker connected to the detection system. The word decoder detects
words by comparing an input pattern (e.g., of hypothetical word matches) to reference
patterns (e.g., words). The subword decoder compares an input pattern (e.g., hypothetical
words matches based on subword or phoneme recognition) to reference patterns (e.g.,
words) based on a word pronunciation distance measure that indicates how close
each input pattern is to matching each reference pattern. The subword decoder sorts
the source set of reference patterns based on a closeness of each reference pattern
to correctly matching the input pattern based on generated pattern comparisons.
The word decoder and subword decoder each provide an N-best list of hypothetical
matches to the spoken input. A list fusion module of the detection system selectively
combines the two N-best lists to produce a final or combined N-best list. The final
or combined list has a predefined number of matches.