The video signal processor 10 includes a scene detector 16 which
uses features extracted for visual segments and/or audio segments resulted from
segmentation of an input stream of video data, and a criterion for measurement
of similarity between visual and/or audio segment pairs, calculated for each of
the features using the similarity measurement criterion, to detect two visual segments
and/or audio segments whose time gap is within a predetermined temporal threshold
and whose dissimilarity is less than a predetermined dissimilarity threshold and
group the segments into a scene consisting of visual segments and/or audio segments
reflecting the semantics of the video data content and temporally contiguous to
each other.