Disclosed is a system architecture, components and a searching technique
for an Unstructured Information Management System (UIMS). The UIMS may be
provided as middleware for the effective management and interchange of
unstructured information over a wide array of information sources. The
architecture generally includes a search engine, data storage, analysis
engines containing pipelined document annotators and various adapters.
The searching technique makes use of a two-level searching technique.
Also disclosed is system, method and computer program product to process
document data. The method includes inputting a document and operating at
least one text analysis engine that comprises a plurality of coupled
annotators for tokenizing document data for identifying and annotating a
particular type of semantic content. Operating the at least one text
analysis engine generates a plurality of views of a document, where each
of the plurality of views are derived from a different tokenization of
the document. The method further includes storing the plurality of views
in a common data structure associated with the document.