A system is presented for scanning entire books or document all at once
using an adaptive process where the book or document has known fonts and
unknown fonts. The known fonts are processed through a verification
system where sure words and error words are determined. Both the sure
words and error words are sent to OCR training where they are re-OCR'ed
and repeatedly verified until they meet a predetermined quality criteria.
Characters or words not meeting the predetermined quality criteria
receive additional OCR training until all the characters and words pass
the predetermined quality criteria. Unknown fonts are scanned and
clustered together by shape. Outliers in the shapes are manually
keyed-in. Those symbols that are manually classified go to OCR training
and then to the known type optimization process.