A method for estimating the result of a query on a database having data
records arranged in tables. The database has an expected workload that
includes a set of queries that can be executed on the database. An
expected workload is derived including a set of queries that can be
executed on the database. A sample is constructed by selecting data
records for inclusion in the sample in a manner that minimizes an
estimation error when the data records are acted upon by a query in the
expected workload to provide an expected workload to provide an expected
result. The query accesses the sample and is executed on the sample,
returning an estimated query result. The expected workload can be
constructed by specifying a degree of overlap between records selected by
queries in the given workload and records selected by queries in the
expected workload.