A method and system for tracking human speakers using a plurality of
acoustic sensors arranged in an array to detect the voice of the speakers
within an angular range in order to determine a most favorable direction
for detecting the voice in a detection period. A beamformer is used to
form a plurality of beams each covering a different direction within the
angular range and generate a signal responsive to the voice of the
speakers for each beam. A comparator is used to periodically compare the
power level of the signal of different beams in order to determine the
most favorable detection direction according to the movement of the human
speakers. A voice activity detection device is used to indicate to the
comparator when the voice of the speakers is detected so that the
comparator determines the most favorable detection direction based on the
voice of the speakers and not the noise when the speakers are silent
during the detection period.