A system for electronically distilling information from a business
document uses a network scanner to electronically scan a platen area,
having a business document thereon, to create a bitmap. A network server
carries out a segmentation process to segment the scan generated bitmap
into a bitmap object, the bitmap object corresponding to the scanned
business document; a bitmap to text conversion process to convert the
bitmap object into a block of text; a semantic recognition process to
generate a structured representation of semantic entities corresponding
to the scanned business document; and a document generation process to
convert the structured representation into a structure text file. The
semantic recognition process includes the processes of generating, for
each line of text having a keyword therein, a terminal symbol
corresponding to the keyword therein; generating, for each line of text
not having a keyword therein and absent of numeric characters, an
alphabetic terminal symbol; generating, for each line of text not having
a keyword therein and having a numeric character therein, an alphanumeric
terminal symbol; generating a string of terminal symbols from the
generated terminal symbols; determining a probable parsing of the
generated string of terminal symbols; labeling each text line, according
to a determined function, with non-terminal symbols; and parsing the
business document information text into fields of business document
information text based upon the non-terminal symbol of each text line and
the determined probable parsing of the generated string of terminal
symbols.