Adaptive matching of similar data in a data repository to determine if two
or more data items are related in accordance with configurable criteria.
Matches are adapted by learning and presenting appropriate match criteria
based on previous user input. The system can merge the data items into
one master data item, group similar items and perform further processing
based on the result. The configurable match criteria presented to a user
are adapted by the system based on previous interactions of the system
with users. Matching is performed by selecting data items to match,
removing frequently used strings, normalizing data, tokenizing multi-word
data items, assigning weights to each token, calculating a score using
the assigned weights, generating groups of similar records, assigning
thresholds for match levels. Adapting choices of match criteria for a
user based on past interaction allows for rapid match creation and match
maintenance that optimizes data integrity across an enterprise.