A method of tracking an object such as a face in a video stream comprises
running an object detector at a plurality of locations on a first frame,
defining a coarse grid. This is repeated for second and subsequent
frames, with the grid slightly offset each time so that, ultimately, all
of the points on a fine grid are covered but in several passes. When an
object such as a face is located on one frame, positional and/or scale
information is propagated to the next frame to assist in the tracking of
that object onto the next frame.