A method of estimating an aggregate of a join over data-streams in
real-time using skimmed sketches, that only examines each data element
once and has a worst case space requirement of O(n.sup.2/J), where J is
the size of the join and n is the number of data elements. The skimmed
sketch is an atomic sketch, formed as the inner product of the
data-stream frequency vector and a random binary variable, from which the
frequency values that exceed a predetermined threshold have been skimmed
off and placed in a dense frequency vector. The join size is estimated as
the sum of the sub-joins of skimmed sketches and dense frequency vectors.
The atomic sketches may be arranged in a hash structure so that
processing a data element only requires updating a single sketch per hash
table. This keeps the per-element overhead logarithmic in the domain and
stream sizes.