Described herein are optimizations and execution strategies for
spreadsheet extensions to SQL. The partitioning of data, as specified in
a spreadsheet clause, provides a way to parallelize the computation of
spreadsheet and to provide and improve scalability. Even if the
partitioning is not explicitly specified in the spreadsheet clause, the
database optimizer can automatically infer the partitioning in some
cases. Efficient hash based access structures on relations can be used
for symbolic array addressing, enabling fast computation of formulas.
When rewriting SQL statements, formulas whose results are not referenced
in outer blocks can be removed from the spreadsheet clause, thus removing
unnecessary computations. The predicates from other query blocks can be
moved inside query blocks with spreadsheets clauses, thus considerably
reducing the amount of data to be processed. Conditions for validity of
this transformation are given.