A system and method are disclosed for identifying useless or insignificant
documents in a document hit list assembled from documents stored in one or
more document collection databases. A search engine is used to compose the
document hit list based on a query presented by a user. A text extraction
algorithm run by a processor is then used to process the documents
identified by the document hit list to produce a table of terms and their
corresponding collection-level importance ranking called the IQ or
Information Quotient. The text algorithm also produces a table of the most
important terms per document. The documents are also scanned independently
and a table of documents with filenames and lengths is also produced. A
summarizing text algorithm is also run by a processor against the
documents of the document hit list to produce a table of terms having a
high tf*idf value for each document. All of the tables are stored in a
relational database, which allows the system of the present invention to
generate a table of terms per document ranked by decreasing IQ. To
determine whether a document is useful or useless, the table of terms and
IQs, the table of most important terms per document, the table of
documents with filename and lengths, and the table of high tf*idf values
are examined.