A similarity detector detects similar or near duplicate occurrences of a
document. The similarity detector determines similarity of documents by
characterizing the documents as clusters each made up of a set of term
entries, such as pairs of terms. A pair of terms, for example, indicates
that the first term of the pair occurs before the second term of the pair
in the underlying document. Another document that has a threshold level
of term entries in common with a cluster is considered similar to the
document characterized by the cluster.