A method and computer program product allows for the efficient allocation of
buffers
(e.g., first-in first-out (FIFO) queues) for current and predicted active speakers
in voice conferencing systems. The method and computer program product, implemented
by a server hosting an audio conference for a plurality of speakers, minimizes
the loss of audio data for speakers as they switch from "non-active" to "active"
status. This is accomplished by employing a set of active speaker buffers and a
set of predicted active speaker buffers. The predicted active speaker buffers maintain
a collection of the most recent x packets or m milliseconds of "non-active" speaker
audio data, and transfer a portion of the data from the predicted active speaker
buffers to the active speaker buffers as speakers become "active" speakers. The
x packets or m milliseconds of stored "non-active" speaker audio data can be used
only up to a pre-determined jitter buffer fill-level in order to avoid introducing
additional audio packet delivery delay to participants of the conference.