A method for estimating the result of a query on a database having data
records arranged in tables. The database has an expected workload that
includes a set of queries that can be executed on the database. A sample
is constructed by selecting data records for inclusion in the sample in a
manner that minimizes an estimation error when the data records are acted
upon by a query in the expected workload to provide an estimated result.
The query accesses the sample and is executed on the sample, returning an
estimated query result. The expected workload can be constructed by
specifying a degree of overlap between records selected by queries in the
given workload and records selected by queries in the expected workload.