A method and system to optimize throughput of executable program code are
provided. The system comprises a profiler to receive a representation of
a plurality of functions, an aggregator, and a mapper to map the
plurality of aggregates to a plurality of processors. The aggregator may
be configured to create an aggregate for each function from the plurality
of functions thereby creating a plurality of aggregates, choose an
optimization action between grouping and duplication based on the number
of aggregates in the plurality of aggregates, the number of available
processing elements (PEs), and execution time of each aggregate, and
perform the chosen optimization action.