Computationally efficient searching, browsing and retrieval of one or more
objects in a video sequence are accomplished using learned generative
models. The generative model is trained on an automatically or manually
selected query sequence from a sequence of image frames. The resulting
generative model is then used in searching, browsing or retrieval of one
or more similar or dissimilar image frames or sequences within the image
sequence by determining the likelihood of each frame under the learned
generative model. Further, this method allows for automatic separation
and balancing of various causes of variability while analyzing the image
sequence. The generative models are based on appearances of multiple,
possibly occluding objects in an image sequence. Further, the search
strategies used include clustering and intelligent fast forward through
the image sequence. Additionally, in one embodiment, a fast forward speed
is relative to the current frame likelihood under the learned generative
model.