A multichannel source activity detection system, e.g., a voice activity
detection (VAD) system, and method that exploits spatial localization of
a target audio source is provided. The method includes the steps of
receiving a mixed sound signal by at least two microphones; Fast Fourier
transforming each received mixed sound signal into the frequency domain;
filtering the transformed signals to output a signal corresponding to a
spatial signature of a source; summing an absolute value squared of the
filtered signal over a predetermined range of frequencies; and comparing
the sum to a threshold to determine if a voice is present. Additionally,
the filtering step includes multiplying the transformed signals by an
inverse of a noise spectral power matrix, a vector of channel transfer
function ratios, and a source signal spectral power.