A method of self-discovery and self-calibration is provided for allowing
arbitrary placement of audio and video components in a multimedia
conferencing system. In particular, one or more markers are provided on
the audio components (e.g. microphone arrays, etc.) that are detectable
by the video components (e.g. cameras). A unique signature (e.g. flashing
sequence, color, etc.) characterizes each marker so that its exact
location relative to the camera may be calculated. A self-calibration
operation is then performed to relate, regulate and standardize
dimensions and locations in the conferencing environment to the video
system.