A method and apparatus for creating a statistical representation of a
query result that can be performed without executing the underlying
query. For a binary-join query, a scan is performed on one of the join
tables. A multiplicity value that estimates the number of tuples in the
other join table that has a matching join attribute to the scanned tuple
is calculated. A number of copies (as determined by the multiplicity
value) are placed in a stream of tuples that is sampled to compile the
statistical representation of the query result. For acyclic-join
generating queries including selections, the above procedure is
recursively extended. If multiple statistical representations are sought,
scans can be shared. Scan sharing can be optimized using shortest common
supersequence techniques.