A language such as English or a language group such as an Asian language
group is recognized based upon document image data. The document image
data is processed to determine a minimal circumscribing rectangle for
each character. The layout characteristics of the minimal circumscribing
rectangles are quantified in a discrete number of ranges. The layout
characteristic information includes a certain ratio with respect to the
minimal circumscribing rectangle height and width as well as a black
pixel density in the minimal circumscribing rectangle. Based upon the
quantified layout characteristic information, an occurrence probability
of a predetermined number of characters is determined using training data
for a predetermined number of languages. The occurrence probability is
stored in a table for later reference for an unknown input language.