Methods, systems and computer instructions on computer readable media are
disclosed for optimizing a query, including a first join path, a second
join path, and an optimizer, to efficiently provide high quality
information from large, multiple databases. The methods and systems
include evaluating a schema graph identifying the join paths between a
field X and a field Y, and a value X=x, to identify the top-few values of
Y=y that are reachable from a specified X=x value when using the join
paths. Each data path that instantiates the schema join paths can be
scored and evaluated as to the quality of the data with respect to
specified integrity constraints to alleviate data quality problems.
Agglomerative scoring methodologies can be implemented to compute high
quality information in the form of a top-few answers to a specified
problem as requested by the query.