A search system generates an index for databases by generatively sampling
the databases and uses that index to identify and formulate queries for
searching the databases. The generated index is referred to as a
domain-attribute index and contains a domain-level index and site-level
indexes. A site-level index for a database maps site attributes to
distinct attribute values within the database. The domain-level index for
a domain maps attribute values to database and site attribute pairs that
contain those attribute values. To generate a site-level index for a
database within a certain domain, the search system starts out with an
initial set of the sample data for that domain. The search system
generates sampling queries based on the sample data and submits the
sampling queries to a database. The search system updates the site-level
index based on the sampling results and uses the results to generate more
sampling queries.