Real-time segmentation of foreground from background layers in binocular
video sequences may be provided by a segmentation process which may be
based on one or more factors including likelihoods for stereo-matching,
color, and optionally contrast, which may be fused to infer foreground
and/or background layers accurately and efficiently. In one example, the
stereo image may be segmented into foreground, background, and/or
occluded regions using stereo disparities. The stereo-match likelihood
may be fused with a contrast sensitive color model that is initialized or
learned from training data. Segmentation may then be solved by an
optimization algorithm such as dynamic programming or graph cut. In a
second example, the stereo-match likelihood may be marginalized over
foreground and background hypotheses, and fused with a contrast-sensitive
color model that is initialized or learned from training data.
Segmentation may then be solved by an optimization algorithm such as a
binary graph cut.