A system/method is presented for scanning entire books or document all at
once using an adaptive process where the book or document has known fonts
and unknown fonts. The known fonts are processed through a verification
system where sure words and error words are determined. Both the sure
words and error words are sent to OCR training where they are re-OCR'ed
and repeatedly verified until they meet a predetermined quality criteria.
Characters or word not meeting the predetermined quality criteria receive
additional OCR training until all the characters and words pass the
predetermined quality criteria. Unknown fonts are scanned and clustered
together by shape. Outliers in the shapes are manually key-in. Those
symbols that are manually classified go to OCR training and then to the
known type optimization process.