A method and apparatus are provided for multi-class, mutli-label
information categorization. A weight is assigned to each information
sample in a training set, the training set containing a plurality of
information samples, such as text documents, and associated labels. A base
hypothesis is determined to predict which labels are associated with a
given information sample. The base hypothesis predicts whether or not each
label is associated with information sample or predicts the likelihood
that each label is associated with the information sample. In the case of
a document, the base hypothesis evaluates words in each document to
determine one or more words that predict the associated labels. When a
base hypothesis is determined, the weight assigned to each information
sample in the training set is modified based on the base hypothesis
predictions.