Database system query optimizers use several techniques such as histograms
and sampling to estimate the result sizes of operators and sub-plans
(operator trees) and the number of distinct values in their outputs.
Instead of estimates, the invention uses the exact actual values of the
result sizes and the number of distinct values in the outputs of
sub-plans encountered by the optimizer. This is achieved by optimizing
the query in phases. In each phase, newly encountered sub-plans are
recorded for which result size and/or distinct value estimates are
required. These sub-plans are executed at the end of the phase to
determine their actual result sizes and the actual number of distinct
values in their outputs. In subsequent phases, the optimizer uses these
actual values when it encounters the same sub-plan again.