In learning for pattern recognition, an aggregation of different types of
object image data is inputted, and local features having given geometric
structures are detected from each object image data inputted. The
detected local features are put through clustering, plural representative
local features are selected based on results of the clustering, and a
learning data set containing the selected representative local features
as supervisor data is used to recognize or detect an object that
corresponds to the object image data. The learning thus makes it possible
to appropriately extract, from an aggregation of images, local features
useful for detection and recognition of subjects of different categories.