System and method for identifying query-relevant keywords in documents with latent semantic analysis

A system and method for identifying query-related keywords in documents found in a search using latent semantic analysis. The documents are represented as a document term matrix M containing one or more document term-weight vectors d, which may be term-frequency (tf) vectors or term-frequency inverse-document-frequency (tf-idf) vectors. This matrix is subjected to a truncated singular value decomposition. The resulting transform matrix U can be used to project a query term-weight vector q into the reduced N-dimensional space, followed by its expansion back into the full vector space using the inverse of U.To perform a search, the similarity of q.sub.expanded is measured relative to each candidate document vector in this space. Exemplary similarity functions are dot product and cosine similarity. Keywords are selected with the highest values in q.sub.expanded that are also comprised in at least one document. Matching keywords from the query may be highlighted in the search results.

Web www.patentalert.com

< Personalization of web page search rankings

> System and method and computer program product for ranking logical directories

~ 00470