Methods for interactive selecting video queries consisting of training
images from a video for a video similarity search and for displaying the
results of the similarity search are disclosed. The user selects a time
interval in the video as a query definition of training images for
training an image class statistical model. Time intervals can be as short
as one frame or consist of disjoint segments or shots. A statistical
model of the image class defined by the training images is calculated
on-the-fly from feature vectors extracted from transforms of the training
images. For each frame in the video, a feature vector is extracted from
the transform of the frame, and a similarity measure is calculated using
the feature vector and the image class statistical model. The similarity
measure is derived from the likelihood of a Gaussian model producing the
frame. The similarity is then presented graphically, which allows the
time structure of the video to be visualized and browsed. Similarity can
be rapidly calculated for other video files as well, which enables
content-based retrieval by example. A content-aware video browser
featuring interactive similarity measurement is presented. A method for
selecting training segments involves mouse click-and-drag operations over
a time bar representing the duration of the video; similarity results are
displayed as shades in the time bar. Another method involves selecting
periodic frames of the video as endpoints for the training segment.