A speech enhancement method, including the steps of: (a) segmenting an
input speech signal into a plurality of frames and transforming each frame
signal into a signal of the frequency domain; (b) computing the
signal-to-noise ratio of a current frame, and computing signal-to-noise
ratio of a frame immediately preceding the current frame; (c) computing
the predicted signal-to-noise ratio of the current frame which is
predicted based on the preceding frame and computing the speech absence
probability using the signal-to-noise ratio and predicted signal-to-noise
ratio of the current frame; (d) correcting the two signal-to-noise ratios
obtained in the step (b) based on the speech absence probability computed
in the step (c); (e) computing the gain of the current frame with the two
corrected signal-to-noise ratios obtained in the step (d), and multiplying
the speech spectrum of the current frame by the computed gain; (f)
estimating the noise and speech power for the next frame to calculate the
predicted signal-to-noise ratio for the next frame, and providing the
predicted signal-to-noise ratio for the next frame as the predicted
signal-to-noise ratio of the current frame for the step (c); and (g)
transforming the result spectrum of the step (e) into a signal of the time
domain. The noise spectrum is estimated in speech presence intervals based
on the speech absence probability, as well as in speech absence intervals,
and the predicted SNR and gain are updated on a per-channel basis of each
frame according to the noise spectrum estimate, which in turn improves the
speech spectrum in various noise environments.