Process for identifying duplicate values in very large data sets

The present invention is directed to a method of identifying duplicate data elements in large data sets. This involves receiving the data sets. Dividing each data element in the data set into a series of data segments to define data keys. Generating an intermediate value for the each element in the data set using summed values for the data keys. Sorting the data entries using the intermediate values. Sorting the matched intermediate value entries using the data keys. Identifying the duplicate data elements in the data set.

Web www.patentalert.com

< Automated management of software images for efficient resource node building within a grid environment

> Method and system for network load balancing with a compound data structure

> Methods, systems, and computer program products for implementing intelligent agent services

~ 00546