A database system includes an enhanced technique for performing sorts in
which removals of duplicate records are performed to compact the size of
segments being sorted. To enhance a query plan, the duplicate record
removal is performed as early in the query plan as possible. By removing
duplicate records early in the query plan, the number of input/output
(I/O) operations is reduced, resulting in more efficient usage of
database system resources. In example implementations, two type of sorts
are performed: a heap sort (to sort successive segments of an input file,
with the sorting associated with concurrent removal of duplicate records
to compact each segment so that a smaller number of I/O accesses is
needed); and a merge sort (in which output files from prior sorting
passes are merged and sorted, with the merge sort process also associated
with the removal of duplicate records to further compact the data
segments and reduce the number of I/O accesses).