Creating audio-centric, image-centric, and integrated audio-visual summaries page

Systems and methods create high quality audio-centric, image-centric, and integrated audio-visual summaries by seamlessly integrating image, audio, and text features extracted from input video. Integrated summarization may be employed when strict synchronization of audio and image content is not required. Video programming which requires synchronization of the audio content and the image content may be summarized using either an audio-centric or an image-centric approach. Both a machine learning-based approach and an alternative, heuristics-based approach are disclosed. Numerous probabilistic methods may be employed with the machine learning-based learning approach, such as nave Bayes, decision tree, neural networks, and maximum entropy. To create an integrated audio-visual summary using the alternative, heuristics-based approach, a maximum-bipartite-matching approach is disclosed by way of example.