Mixing video signals for an audio and video multimedia conference call page

In a multimedia communications system (100) that supports conference calls that include an audio portion and a video portion, a primary video image is selected from a plurality of video images based on an amount of audio data generated. The amount of audio data is determined by counting a number of audio packets or by counting an amount of audio samples in audio packets (204). A dominant audio participant is selected if the difference in the amount of audio exceeds a predetermined threshold (206). If the difference in the amount of audio does not exceed the predetermined threshold (206), the dominant audio participant may be determined by comparing the loudness or volume for each audio participant (207 212). The primary video image is selected to correspond to the dominant audio participant (208, 214). The primary video image remains constant for a predetermined period of time before the possibility to change (210, 216).