A term-by-document matrix is compiled from a corpus of documents
representative of a particular subject matter that represents the
frequency of occurrence of each term per document. A weighted term
dictionary is created using a global weighting algorithm and then applied
to the term-by-document matrix forming a weighted term-by-document
matrix. A term vector matrix and a singular value concept matrix are
computed by singular value decomposition of the weighted term-document
index. The k largest singular concept values are kept and all others are
set to zero thereby reducing to the concept dimensions in the term vector
matrix and a singular value concept matrix. The reduced term vector
matrix, reduced singular value concept matrix and weighted term-document
dictionary can be used to project pseudo-document vectors representing
documents not appearing in the original document corpus in a
representative semantic space. The similarities of those documents can be
ascertained from the position of their respective pseudo-document vectors
in the representative semantic space.