An extraction manager extracts information from formatted input. The input
is annotated with presentation information, and parsed into a set of
elements comprising a canonical representation thereof. An information
analyzer analyzes the elements in order to glean additional information.
An entity extractor determines entities to extract from the input. The
entity extractor analyzes elements according to specific entities to be
extracted, and creates entity specific observations for analyzed
elements. These observations comprise possible values for the relevant
entities. A heuristics processor maintains a collection of entity
specific heuristics, each comprising a test to help determine the
suitability of data as a value for the corresponding entity. The
heuristics processor selects heuristics for the entities to be extracted,
and tests observations for these entities against the selected
heuristics. Responsive to this testing, ordered possible values for
entities to extract are determined.