An interactive voice response system is described that supports full
duplex data transfer to enable the playing of a voice prompt to a user of
telephony system while the system listens for voice barge-in from the
user. The system includes a speech detection module that may utilize
various criteria such as frame energy magnitude and duration thresholds
to detect speech. The system also includes an automatic speech
recognition engine. When the automatic speech recognition engine
recognizes a segment of speech, a feature extraction module may be used
to subtract a prompt echo spectrum, which corresponds to the currently
playing voice prompt, from an echo-dirtied speech spectrum recorded by
the system. In order to improve spectrum subtraction, an estimation of
the time delay between the echo-dirtied speech and the prompt echo may
also be performed.