Page numbering of images of pages in a document includes extracting all
numbers that are exactly one different than a number found on an adjacent
page, and grouping the extracted numbers into a set of sequences that
describe the candidate page numbers in the book. The sequences most
likely to contain candidates that represent the actual page numbers are
determined by merging the most reliable sequences together to bridge gaps
between the sequences, and identifying those gaps where the page numbers
have been intentionally omitted. Page images are labeled with numbers
that are determined to be most likely to represent the actual page
number. Page numbering is abandoned when insufficient numbers of pages
numbers are able to be extracted or assigned relative to the total number
of pages in the document.