The invention provides a system and method for automatically indexing and
retrieving multimedia content. The method may include separating a
multimedia data stream into audio, visual and text components, segmenting
the audio, visual and text components based on semantic differences,
identifying at least one target speaker using the audio and visual
components, identifying a topic of the multimedia event using the
segmented text and topic category models, generating a summary of the
multimedia event based on the audio, visual and text components, the
identified topic and the identified target speaker, and generating a
multimedia description of the multimedia event based on the identified
target speaker, the identified topic, and the generated summary.