A computer assisted/implemented method for developing a classifier for
classifying communications includes roughly four stages, where these
stages are designed to be iterative: (1) a stage defining where and how
to harvest messages (i.e., from Internet message boards, ews groups and
the like), which also defines an expected domain of application for the
lassifier; (2) a guided question/answering stage for the computerized
tool to elicit the user's criteria for determining whether a message is
relevant or irrelevant; (3) a labeling stage where the user examines
carefully-selected messages and provides feedback about whether or not it
is relevant and sometimes also what elements of the criteria were used to
make the decision; and (4) a performance evaluation stage where
parameters of the classifier training are optimized, the best classifier
is produced, and known performance bounds are calculated. In the guided
question/answering stage, the criteria are parameterized in such a way
that (a) they can be operationalized into the text classifier through key
words and phrases, and (b) a human-readable criteria can be produced,
which can be reviewed and edited. The labeling phase is oriented toward
an extended Active Learning framework. That is, the exemplary embodiment
decides which example messages to show the user based upon what category
of messages the system thinks would be most useful to the Active Learning
process.