An image processing apparatus and method include providing a same label
number to adjacent black pixels that form a group of black pixels, the
label number provided to each group of black pixels being unique, the
black pixels being included in binary read data, determining, when the
number of black pixels in a group of black pixels counted on the basis of
the unique label number is less than a predetermined number, that the
black pixels with the unique label number are noise and removing the
black pixels from targets to be processed, acquiring coordinate
information on at least one of black pixels that remain after noise has
been removed, and extracting a document area on the basis of the acquired
coordinate information.