Methods are disclosed for performing proper word alignment that satisfy
constraints of coverage and transitive closure. Initially, a translation
matrix which defines word association measures between source and target
words of a corpus of bilingual translations of source and target
sentences is computed. Subsequently, in a first method, the association
measures in the translation matrix are factorized and orthogonalized to
produce cepts for the source and target words, which resulting matrix
factors may then be, optionally, multiplied to produce an alignment
matrix. In a second method, the association measures in the translation
matrix are thresholded, and then closed by transitivity, to produce an
alignment matrix, which may then be, optionally, factorized to produce
cepts. The resulting cepts or alignment matrices may then be used by any
number of natural language applications for identifying words that are
properly aligned.