In a signature-based duplicate detection system, multiple different
lexicons are used to generate a signature for a document that comprises
multiple sub-signatures. The signature of an e-mail or other document may
be defined as the set of signatures generated based on the multiple
different lexicons. When a collection of sub-signatures is used as a
document's signature, two documents may be considered as being duplicates
when a sub-signature generated based on a particular lexicon in the
collection for the first document matches a signature generated based on
the same lexicon in the collection for the second document.