The present invention relates generally to automatically processing
electronic documents. In one aspect, features and/or properties of words
are identified from a set of training documents to aid in extracting
information from documents to be processed. The features and/or
properties relate to text of the words, position of the words and the
relationship to other words. A classifier is developed to express these
features and/or properties. During information extraction, documents are
processed and analyzed based on the classifier and information is
extracted based on correspondence of the documents and the
features/properties expressed by the classifier.