A visual motion analysis method that uses multiple layered global motion models
to both detect and reliably track an arbitrary number of moving objects appearing
in image sequences. Each global model includes a background layer and one or more
foreground "polybones", each foreground polybone including a parametric shape model,
an appearance model, and a motion model describing an associated moving object.
Each polybone includes an exclusive spatial support region and a probabilistic
boundary region, and is assigned an explicit depth ordering. Multiple global models
having different numbers of layers, depth orderings, motions, etc., corresponding
to detected objects are generated, refined using, for example, an EM algorithm,
and then ranked/compared. Initial guesses for the model parameters are drawn from
a proposal distribution over the set of potential (likely) models. Bayesian model
selection is used to compare/rank the different models, and models having relatively
high posterior probability are retained for subsequent analysis.