The present disclosure is directed to a method for providing an OpenMP
reduction implementation. The method may comprise creating an aggregate
of at least one reduction variable in a parallel region or a work-sharing
construct; defining a pointer variable, the pointer variable pointing to
a dynamic array of the aggregate; creating an initialization routine, an
outlined routine and a reduction accumulation routine; replacing the
parallel region or the work-sharing construct with a runtime routine, the
runtime routine taking a plurality of arguments including an address of
the initialization routine, an address of the outlined routine, an
address of the reduction accumulation routine, an address of the pointer
variable, and a size of the aggregate; and executing the runtime routine
when the at least one reduction variable is in the parallel region or the
work-sharing construct.