A system, method and computer readable medium for sampling data from a relational
database are disclosed, where an information processing system chooses rows from
a table in a relational database for sampling, wherein data values are arranged
into rows, rows are arranged into pages, and pages are arranged into tables. Pages
are chosen for sampling according to a probability P and rows in a selected page
are chosen for sampling according to a probability R, so that the overall probability
of choosing a row for sampling is Q=PR. The probabilities P and R are based on
the desired precision of estimates computed from a sample, as well as processing
speed. The probabilities P and R are further based on either catalog statistics
of the relational database or a pilot sample of rows from the relational database.