A method and system for determining similarity or correlation between
categories of a hierarchical taxonomy for documents by combining
heterogeneous similarity metrics is provided. A correlation system uses
both a taxonomy distance metric and a term space distance metric to
represent the similarity between categories. The correlation system finds
a new distance metric for categories that factors in both the taxonomy
distance metric and the term space distance metric. The new distance
metric can then be used by classifiers to more accurately represent the
correlation between categories.