An apparatus and method for generating object-labeled images based on query images
in a video sequence are provided. A video sequence is divided into a plurality
of shots, each of which consists of a set of frames having a similar scene, and
an initial object region is extracted from each of the shots by determining whether
an object image exists in key frames of the shots. Based on the initial object
region extracted from each of the key frames, object regions are tracked in all
frames of the shots. Then, the object regions are labeled to generate object-labeled
images. Therefore, the object-labeled image generating apparatus and method can
be applied regardless of the degree of motion of an object and time required to
extract query objects is reduced.