A classification system includes a signature-based duplicate detector and
an inductive classifier that share attribute information. To perform the
duplicate detection and the classification, the duplicate detector and
inductive classifier are first initialized by generating a lexicon of
attributes for the duplicate detector and a classification model for the
classifier. To develop a classification model, a training set of
documents of known class are used by the classifier to determine the
attributes of the documents that are most useful in classifying an
unknown document. The model is developed from these attributes. Attribute
information containing the attributes determined by the classifier is
then passed to the duplicate detector and the duplicate detector uses the
attribute information to generate the lexicon of attributes.