System and method for identifying useless documents

A system and method are disclosed for identifying useless or insignificant documents in a document hit list assembled from documents stored in one or more document collection databases. A search engine is used to compose the document hit list based on a query presented by a user. A text extraction algorithm run by a processor is then used to process the documents identified by the document hit list to produce a table of terms and their corresponding collection-level importance ranking called the IQ or Information Quotient. The text algorithm also produces a table of the most important terms per document. The documents are also scanned independently and a table of documents with filenames and lengths is also produced. A summarizing text algorithm is also run by a processor against the documents of the document hit list to produce a table of terms having a high tf*idf value for each document. All of the tables are stored in a relational database, which allows the system of the present invention to generate a table of terms per document ranked by decreasing IQ. To determine whether a document is useful or useless, the table of terms and IQs, the table of most important terms per document, the table of documents with filename and lengths, and the table of high tf*idf values are examined.

Web www.patentalert.com

< (none)

< File management method using transposed file

> Network interactive tree search method and system

> (none)

~ 00037