A noise reduction system including an audio-visual user interface combines
visual features extracted from a digital video sequence with audio
features extracted from an analog audio sequence. The digital video
sequence may show the face of a speaker, and the analog audio sequence
may include background noise in an environment of said speaker. Audio
sequence detection means are used to detect said analog audio sequence,
and audio feature extraction and analysis means are used to analyze said
analog audio sequence and extract said audio features therefrom. Video
sequence detection means are used to detect said video sequence, and
visual feature extraction and analysis means are used to analyze the
detected video sequence and extract said visual features therefrom. A
noise reduction circuit is configured to separate the speaker's voice
from said background noise based on a combination of derived speech
characteristics and output a speech activity indication signal. The
speech activity indication signal includes a combination of speech
activity estimates supplied by said audio feature extraction and analysis
means and said visual feature extraction and analysis means. A
multi-channel acoustic echo cancellation unit is configured to perform a
near-end speaker detection and double-talk detection algorithm based on
the speech characteristics derived by said audio feature extraction and
analysis means and said visual feature extraction and analysis means.