The present invention relates to systems, methods, and computer program
products for the analysis of gene expression data, especially data that
have been acquired using microarray technologies. In particular, the
present invention relates to methods for analyzing a set of genes that
have been partitioned into disjoint subsets known as clusters. It
describes methods for quantitatively evaluating the quality of gene
clustering, based on the extent to which the similarity of documents
associated with genes in a cluster collectively distinguish that cluster
from all the other clusters, as well as the extent to which words and
phrases, present in documents associated with genes in the cluster,
collectively distinguish that cluster from all the other clusters.