Methods and systems for estimating a pose of a subject. The subject can be
a human, an animal, a robot, or the like. A camera receives depth
information associated with a subject, a pose estimation module to
determine a pose or action of the subject from images, and an interaction
module to output a response to the perceived pose or action. The pose
estimation module separates portions of the image containing the subject
into classified and unclassified portions. The portions can be segmented
using k-means clustering. The classified portions can be known objects,
such as a head and a torso, that are tracked across the images. The
unclassified portions are swept across an x and y axis to identify local
minimums and local maximums. The critical points are derived from the
local minimums and local maximums. Potential joint sections are
identified by connecting various critical points, and the joint sections
having sufficient probability of corresponding to an object on the
subject are selected.