To provide a document format identification apparatus capable of correctly identifying
a document format even in an environment where a number of types of scanners are
used. The document format identification apparatus includes: an extraction unit
for extracting a feature for a document the format of which to be identified; a
generation unit for generating, based on the feature extracted by the extraction
unit, document format data containing identification data for identifying a document
format and correction information for correcting a feature difference produced
by a difference in type of the image input apparatus; and an identification unit,
for correcting document format data stored in a storage unit and document format
data generated by the generation unit based on the correction information and for
identifying a document format of a document to be identified by comparing the corrected
document format data.