A method and system for classifying messages of a discussion thread as
questions is provided. A classification system generates a classifier to
classify messages of discussion threads as question messages or
non-question messages. The system trains the classifier using the feature
vectors and input classifications derived from a training set of
discussion threads. After the classifier is trained, the classification
system uses the classifier to classify messages within a corpus of
discussion threads as question or non-question messages. To classify a
message, the classification system generates a feature vector for the
messages and submits that feature vector to the classifier. The
classifier generates a score for the message indicating a likelihood that
the message is a question message.