Methods are given for improving discriminative training of hidden Markov
models for continuous speech recognition. For a mixture component of a
hidden Markov model state, a gradient adjustment is calculated of the
standard deviation of the mixture component. If the calculated gradient
adjustment is greater than a first threshold amount, an adjustment is
performed of the standard deviation of the mixture component using the
first threshold. If the calculated gradient adjustment is less than a
second threshold amount, an adjustment is performed of the standard
deviation of the mixture component using the second threshold. Otherwise,
an adjustment is performed of the standard deviation of the mixture
component using the calculated gradient adjustment.