The invention determines the population size and population overlap in
data containing records on the unique entities without unique identifiers
for the unique entities and having at least one common type of
information with a known distribution of finite expectation by
decomposing probabilistic calculations. The computer determines
population overlap of unique entities between the data sets by
subtracting a probabilistic incremental number of unique entities needed
for a larger total number of values of the information with the known
distribution from the data sets. The invention can also maintain the
security of private data by allowing a remote computer where the original
data is stored to download diagnostic and aggregation procedures from
another computer over a network. The remote computer performs the
functions on the data and forwards the results to the estimate processor
computer over the network. The estimate processor determines population
size and overlap from aggregate results and forwards this information
back to the remote computer over the network. The invention also
determines the overlap of three or more data sets by concatenating all
combinations of the data sets and determining estimates for all subsets
of the combinations of the data sets. The operations involve the
cancellation of equivalent terms that have opposite signs.