Systems and methods for classifying documents into categories based on
text associated with the documents are disclosed. Embodiments of the
present invention further provide methods for establishing a database of
hierarchical classes and a system for classifying text-related content
into the hierarchical classes. Text relating to documents is parsed into
features with at least one feature having a plurality of terms.
Vocabulary is determined from the features based on feature frequency and
for each class into which a document is classified, the vocabulary that
occurs in the text associated with the document is stored.