The features that are presented to an evolutionary algorithm are preprocessed
to generate combination features that may be more efficient in distinguishing among
classifications than the individual features that comprise the combination feature.
An initial set of features is defined that includes a large number of potential
features, including the generated features that are combinations of other features.
These features include, for example, all of the words used in a collection of content
material that has been previously classified, as well as combination features based
on these features, such as all the noun and verb phrases used. This pool of original
features and combination features are provided to an evolutionary algorithm for
a subsequent evaluation, generation, and determination of the best subset of features
to use for classification. In this evaluation and generation process, each combination
feature is processed as an independent feature, independent of the features that
were used, or not used, to form the combination feature. In this manner, for example,
a particular phrase that is generated as a combination of original feature words
may be determined to be a better distinguishing feature than any of the original
feature words and a more efficient distinguishing feature than an unrelated selection
of the individual feature words, as might be provided by a conventional evolutionary
algorithm. The resultant best performing subset is subsequently used to characterize
new content material for automated classification. If the automated classification
includes a learning system, the evolutionary algorithm and the generated combination
features are also used to train the learning system.