A context of media content is represented by context description data
having a hierarchical stratum. The highest hierarchical layer is formed
from a single element representing content. The lowest hierarchical layer
is formed from an element representing a segment of media content which
corresponds to a change between scenes or audible tones. The remaining
hierarchical layers are formed from an element representing a scene or a
collection of scenes. A score corresponding to the context of a scene of
interest is appended, as an attribute, to each of the remaining
hierarchical layers. A score relating to time information and a context
is appended, as an attribute, to individual elements in the lowest
hierarchical layer. In a selection step, the context of the media content
is expressed, and one or more scenes are selected based on the score. In
an extraction step, only data pertaining to the selected scenes are
extracted.