The system of the present invention allows a user to generate a
representation of time-based media. The system of the present invention
includes a feature extraction module for extracting features from media
content. For example, the feature extraction module can detect solos in a
musical performance, or can detect music, applause, speech, and the like.
A formatting module formats a media representation generated by the
system. The formatting module also applies feature extraction information
to the representation, and formats the representation according to a
representation specification. In addition, the system can include an
augmented output device that generates a media representation based on
the feature extraction information and the representation specification.
The methods of the present invention include extracting features from
media content, and formatting a media representation being generated
using the extracted features and based on a specification or data
structure specifying the representation format. The methods can also
include generating a media representation based on the results of the
formatting.