Systems and methods for clustering-based text classification are
described. In one aspect text is clustered as a function of labeled data
to generate cluster(s). The text includes the labeled data and unlabeled
data. Expanded labeled data is then generated as a function of the
cluster(s). The expanded label data includes the labeled data and at
least a portion of unlabeled data. Discriminative classifier(s) are then
trained based on the expanded labeled data and remaining ones of the
unlabeled data.