A method and system for performing automatic text analysis is described. A
local ranking for one or more contexts with respect to a word and a
global ranking for one or more contexts are generated. The rankings are
based on the frequency with which the contexts appear in a corpus. A
statistic may be generated using the local and global rankings, such as a
log ratio rank statistic equal to the logarithm of the global rank
divided by local rank, to measure the similarity of contexts with respect
to words with which they combine. A source matrix of word to context
values is then created. Singular value decomposition is used to create
sub-matrices from the source matrix. Vectors from the sub-matrices
corresponding to context(s) and/or word(s) are then selected to determine
term-term or context-context similarity or term-context correspondence.