A computer-assisted technique for constructing a three-dimensional model
on top of one or more images (e.g., photographs) such that the model's
parameters automatically match those of the real world object depicted in
the photograph(s). Camera parameters such as focal length, position, and
orientation in space may be determined from the images such that the
projection of a three-dimensional model through the calculated camera
parameters matches the projection of the real world object through the
camera onto the image surface. Modeling is accomplished using primitives,
such as boxes or pyramids, which may be intuitively manipulated to
construct the three-dimensional model on a video display or other display
screen of a computer system with a two-dimensional input controller
(e.g., a mouse, joystick, etc.) such that the displayed three-dimensional
object manipulation emulates physical three-dimensional object
manipulation. Camera and primitive parameters are incrementally updated
to provide visual feedback of the effect of additional constraints on the
three-dimensional model, making apparent which user action may have been
responsible for any failure to provide a modeling solution and, thus,
allowing for rapid reversal and correction thereof. Surface properties
(i.e., textures) may be extracted from the images for use in the
three-dimensional model.