A system and method for indexing multimedia files utilizes audio
characteristics of predefined audio content contained in selected
multimedia segments of the multimedia files to distinguish the selected
multimedia segments. In the exemplary embodiment, the predefined audio
content is speech contained in video segments of video files.
Furthermore, the audio characteristics are speaker characteristics. The
speech-containing video segments are detected by analyzing the audio
contents of the video files. The audio contents of the speech-containing
video segments are then characterized to distinguish the video segments
according to speakers. The indexing of speech-containing video segments
based on speakers allows users to selectively access video segments that
contain speech from a particular speaker without having to manually
search all the speech-containing video segments.