Method and apparatus for removing lines of extraneous text from a
document. Similarities are identified between lines of text on each page
and corresponding lines on a selected subset of pages. Different weight
values are associated with different line numbers of text on a page, each
weight value indicating a degree of likelihood that a line of text
contains extraneous text. One or more lines of text are selectively
removed from a page as a function of the similarities and associated
weight values of line numbers of the lines of text.