A method classifies segments of a video using an audio signal of the video
and a set of classes. Selected classes of the set are combined as a
subset of important classes, the subset of important classes being
important for a specific highlighting task, the remaining classes of the
set are combined as a subset of other classes. The subset of important
classes and classes are trained with training audio data to form a task
specific classifier. Then, the audio signal can be classified using the
task specific classifier as either important or other to identify
highlights in the video corresponding to the specific highlighting task.
The classified audio signal can be used to segment and summarize the
video.