A system and method to facilitate automatic identification of event
classification errors in a network are described. Session data containing
events logged by a user entity over a network in a predetermined period
of time is retrieved from one or more event logs. Each event is further
parsed to generate one or more event units. A frequency parameter of
co-occurrence of the event units within each event of the session data is
further determined and at least one session distance is determined among
the events based on the frequency parameter. At least one classification
distance is further retrieved from a data storage module, such as, for
example, a database or a datastore, the one or more classification
distances representing a relation between the events and corresponding
classification categories within the database or datastore. Each session
distance is further compared to each retrieved classification distance to
ascertain event classification errors within the database or datastore. A
predetermined error code is then assigned to each event if the
corresponding session and classification distances are different.
Finally, a list of classification errors containing the identified events
and their corresponding error codes is output for further review and
analysis, either through manual editing or, in the alternative, through
automatic editing.