A system and process for tracking an object state over time using particle filter
sensor fusion and a plurality of logical sensor modules is presented. This new
fusion framework combines both the bottom-up and top-down approaches to sensor
fusion to probabilistically fuse multiple sensing modalities. At the lower level,
individual vision and audio trackers can be designed to generate effective proposals
for the fuser. At the higher level, the fuser performs reliable tracking by verifying
hypotheses over multiple likelihood models from multiple cues. Different from the
traditional fusion algorithms, the present framework is a closed-loop system where
the fuser and trackers coordinate their tracking information. Furthermore, to handle
non-stationary situations, the present framework evaluates the performance of the
individual trackers and dynamically updates their object states. A real-time speaker
tracking system based on the proposed framework is feasible by fusing object contour,
color and sound source location.