The invention is a system and method for executing a program that comprises a
plurality of basic blocks on a computer system that comprises a plurality of processing
elements. The invention generates a branch instruction by one processing element
of the plurality of processing elements, sends the branch instruction to the plurality
of processing elements. The invention then independently branches to a target of
the branch instruction by each of the processing elements of the plurality of processing
elements when each processing element receives the sent branch instruction. At
least one processing element of the plurality of processing elements receives the
branch instruction at a time later than another processing element of the plurality
of processing elements.