The present invention produces class-specific key words from a document
set. In particular, an importance of a word is defined to take into
account the number of documents in a corresponding class. A class may be
for example, a class of author. An importance value of a word is based on
a ratio of the count value of a word in a corresponding class to the
number of documents in the class, less the ratio of the count value of the
word in other classes to the number of documents in the other classes.
Words are sorted by importance in each class. A characteristic key word in
each class may be derived from the target document set. In an alternative
embodiment, instead of a difference in ratios, the first ratio is divided
by the second ratio. By using the alternative approach to calculating
importance, the variation in the magnitude of the importance will not be
effected by the number of documents.