A system and method for automated populating of an existing concept
hierarchy of items with new items, using entropy as a measure of the
correctness of a potential classification. User-defined concept
hierarchies include, for example, document hierarchies such as
directories for the Internet, library catalogues, patent databases and
journals, and product hierarchies. These concept hierarchies can be huge
and are usually maintained manually. An internet directory may have, for
example, millions of Web sites, thousands of editors and hundreds of
thousands of different categories. The method for populating a concept
hierarchy includes calculating conditional `entropy` values representing
the randomness of distribution of classification attributes for the
hierarchical set of classes if a new item is added to specific classes of
the hierarchy and then selecting whichever class has the minimum
randomness of distribution when calculated as a condition of insertion of
the new data item.