A simplified general model and an associated estimation algorithm is
provided for modeling visual data such as a video sequence. Specifically,
images or frames in a video sequence are represented as collections of
flat moving objects that change their appearance and shape over time, and
can occlude each other over time. A statistical generative model is
defined for generating such visual data where parameters such as
appearance bit maps and noise, shape bit-maps and variability in shape,
etc., are known. Further, when unknown, these parameters are estimated
from visual data without prior pre-processing by using a maximization
algorithm. By parameter estimation and inference in the model, visual
data is segmented into components which facilitates sophisticated
applications in video or image editing, such as, for example, object
removal or insertion, tracking and visual surveillance, video browsing,
photo organization, video compositing, etc.