Methods, apparatus and computer program products for information retrieval and document classification utilizing a multidimensional subspace page

Methods, apparatus and computer program products are provided for retrieving information from a text data collection and for classifying a document into none, one or more of a plurality of predefined classes. In each aspect, a representation of at least a portion of the original matrix is projected into a lower dimensional subspace and those portions of the subspace representation that relate to the term(s) of the query are weighted following the projection into the lower dimensional subspace. In order to retrieve the documents that are most relevant with respect to a query, the documents are then scored with documents having better scores being of generally greater relevance. Alternatively, in order to classify a document, the relationship of the document to the classes of documents is scored with the document then being classified in those classes, if any, that have the best scores.

Methoden, Apparate- und Computerprogrammprodukte werden für das Zurückholen von von Informationen von einer Textdatenerfassung und für das Einstufen eines Dokumentes in keine, in eine oder in mehr einer Mehrzahl der vorbestimmten Kategorien zur Verfügung gestellt. In jedem Aspekt wird eine Darstellung mindestens eines Teils der ursprünglichen Matrix in einen niedrigeren Maßteilraum projiziert und jene Teile der Teilraumdarstellung, die auf dem term(s) der Frage beziehen, werden nach der Projektion in den niedrigeren Maßteilraum belastet. Um die Dokumente aufzufinden die in Bezug auf eine Frage am relevantesten sind, werden die Dokumente dann mit den Dokumenten gezählt, die bessere Kerben haben von der im Allgemeinen grösseren Bedeutung zu sein. Wechselweise zwecks ein Dokument einzustufen, wird das Verhältnis des Dokumentes zu den Kategorien der Dokumente mit dem Dokument gezählt, das dann in jene Kategorien, wenn irgendwelche eingestuft wird, die die besten Kerben haben.