An apparatus for recognizing contents of a video made of picture frames
includes a splitting unit that splits the picture frames into a plurality
of sets of video shots based on cut points indicating a change of screen;
a similar-video-shot extracting unit that extracts video shots similar to
each of the video shots from among the sets of video shots; a
maximum-count-video-shot extracting unit that counts a number of similar
video shots for each of the video shots and extracts a maximum count
video shot having a maximum count of the similar video shots; and a
representative-video-shot determining unit that takes the maximum count
video shot as a representative video shot for the video.