Random samples without replacement are extracted from a distributed set of
items by leveraging techniques for aggregating sampled subsets of the
distributed set. This provides a uniform random sample without
replacement representative of the distributed set, allowing statistical
information to be gleaned from extremely large sets of distributed
information. Subset random samples without replacement are extracted from
independent subsets of the distributed set of items. The subset random
samples are then aggregated to provide a uniform random sample without
replacement of a fixed size that is representative of a distributed set
of items of unknown size. In one instance, a multivariate hyper-geometric
distribution is sampled by breaking up the multivariate hyper-geometric
distribution into a set of univariate hyper-geometric distributions.
Individual items of a uniform random sample without replacement are then
determined utilizing a normal approximation of the univariate
hyper-geometric distributions and a finite population correction factor.