Documents are clustered or categorized to generate a model associating
documents with classes. Outlier measures are computed for the documents
indicative of how well each document fits into the model. Outlier
documents are identified to a user based on the outlier measures and a
user selected outlier criterion. Ambiguity measures are computed for the
documents indicative of a number of classes with which each document has
similarity under the model. If a document is annotated with a label
class, a possible corrective label class is identified if the annotated
document has higher similarity with the possible corrective label class
under the model than with the annotated label class. The clustering or
categorizing is repeated adjusted based on received user input to
generate an updated model associating documents with classes. Outlier and
ambiguity measures are also calculated at runtime for new documents
classified using the model.