A term-by-document matrix is compiled from a corpus of documents
representative of a particular subject matter that represents the
frequency of occurrence of each term per document. A weighted term
dictionary is created using a global weighting algorithm and then applied
to the term-by-document matrix forming a weighted term-by-document matrix.
A term vector matrix and a singular value concept matrix are computed by
singular value decomposition of the weighted term-document index. The k
largest singular concept values are kept and all others are set to zero
thereby reducing to the concept dimensions in the term vector matrix and a
singular value concept matrix. The reduced term vector matrix, reduced
singular value concept matrix and weighted term-document dictionary can be
used to project pseudo-document vectors representing documents not
appearing in the original document corpus in a representative semantic
space. The similarities of those documents can be ascertained from the
position of their respective pseudo-document vectors in the representative
semantic space.