An object is tracked among a plurality of image frames. In an initial frame
an operator selects an object. The object is distinguished from the
remaining background portion of the image to yield a background and a
foreground. A model of the background is used and updated in subsequent
frames. A model of the foreground is used and updated in the subsequent
frames. Pixels in subsequent frames are classified as belonging to the
background or the foreground. In subsequent frames, decisions are made,
including: which pixels do not belong to the background; which pixels in
the foreground are to be updated; which pixels in the background were
observed incorrectly in the current frame; and which background pixels are
being observed for the first time. In addition, mask filtering is
performed to correct errors, eliminate small islands and maintain spatial
and temporal coherency of a foreground mask.