A system to capture an image and determine a position of an object
utilizes a camera. A first processing module recognizes a set of
predetermined landmarks, including a first landmark and remainder
landmarks, in the image. A second processing module determines an actual
location of the first landmark in the image, and applies at least one
filtering scheme to estimate positions of the remainder landmarks in the
image. A third processing module determines a pose of the object based on
the actual location of the first landmark and the estimated positions of
the remainder landmarks.