A system for analyzing a vast amount of data representative of chemical structure
and activity information and concisely providing conclusions about structure-to-activity
relationships. A computer may adaptively learn new substructure descriptors based
on its analysis of the input data. The computer may then apply each substructure
descriptor as a filter to establish new groups of molecules that match the descriptor.
From each new group of molecules, the computer may in turn generate one or more
additional new groups of molecules. A result of the analysis in an exemplary arrangement
is a tree structure that reflects pharmacophoric information and efficiently establishes
through lineage what effect on activity various chemical substructures are likely
to have. The tree structure can then be applied as a multi-domain classifier, to
help a chemist classify test compounds into structural subclasses.