The present invention is directed to a method of identifying duplicate
data elements in large data sets. This involves receiving the data sets.
Dividing each data element in the data set into a series of data segments
to define data keys. Generating an intermediate value for the each
element in the data set using summed values for the data keys. Sorting
the data entries using the intermediate values. Sorting the matched
intermediate value entries using the data keys. Identifying the duplicate
data elements in the data set.