Systems and methods are disclosed for packet voice conferencing. An
encoding system accepts two sound field signals, representing the same
sound field sampled at two spatially-separated points. The relative delay
between the two sound field signals is detected over a given time
interval. The sound field signals are combined and then encoded as a
single audio signal, e.g., by a method suitable for monophonic VoIP. The
encoded audio payload and the relative delay are placed in one or more
packets and sent to a decoding device via the packet network.The decoding
device uses the relative delay to drive a playout splitter--once the
encoded audio payload has been decoded, the playout splitter creates
multiple presentation channels by inserting the transmitted relative
delay in the decoded signal for one (or more) of the presentation
channels. The listener thus perceives a speaker's voice as originating
from a location related to the speaker's physical position at the other
end of the conference. An advantage of these embodiments is that a
pseudo-stereo conference can be conducted with virtually the same
bandwidth as a monophonic conference.