A system and method for identifying and/or categorizing similarly formed
paragraphs in a digital image is set forth. An exemplary system includes
a processor and a memory. The memory stores executable components which
when direct the system to perform the following: obtain at least one page
image of reflowable textual content and identify at least one paragraph
of textual content. Thereafter, for each identified paragraph, a
plurality of paragraph metrics regarding the identified paragraph is
determined. Based on the paragraph metrics, similarly formed paragraphs
are clustered.