A method and system for automatically identifying an optimal set of
attributes of entities included in a networked system. Entity types are
ranked based on information gain. A first classification accuracy
relative to a first entity type is determined. The first entity type is
the top-ranked entity type or a first aggregate entity type. A second
entity type is selected based on the ranking. A database join of a first
set of attributes associated with the first entity type and a second set
of attributes associated with the second entity type is performed. A
second classification accuracy relative to a second aggregate entity type
generated by the join is determined. In response to determining that the
second classification accuracy is not greater than the first
classification accuracy, an optimal set of attributes contributing to a
problem in the networked system is identified as the first set of
attributes.