A system and method for identifying query-related keywords in documents
found in a search using latent semantic analysis. The documents are
represented as a document term matrix M containing one or more document
term-weight vectors d, which may be term-frequency (tf) vectors or
term-frequency inverse-document-frequency (tf-idf) vectors. This matrix
is subjected to a truncated singular value decomposition. The resulting
transform matrix U can be used to project a query term-weight vector q
into the reduced N-dimensional space, followed by its expansion back into
the full vector space using the inverse of U.To perform a search, the
similarity of q.sub.expanded is measured relative to each candidate
document vector in this space. Exemplary similarity functions are dot
product and cosine similarity. Keywords are selected with the highest
values in q.sub.expanded that are also comprised in at least one
document. Matching keywords from the query may be highlighted in the
search results.