Ordering instructions for specifying the execution order of other
instructions improve throughput in a pipelined multiprocessor. Hardware,
in conjunction with compiler directives, allows memory write operations
local to a CPU to occur in an arbitrary order, and places constraints on
shared memory operation to occur in a specified order. Multiple sets of
instructions are provided in which order of execution of the instructions
is maintained through the use of CPU registers, write buffers in
conjunction with assignment of sequence numbers to the instruction, or a
hierarchical ordering system. The system ensures that an earlier
designated instruction has reach a specified state of execution prior to
a latter instruction reaching a specified state of execution. The
ordering of operations allows memory operations local to a CPU to occur
in conjunction with other memory operations that are not affected by such
execution. Accordingly, the freedom of operation provided to local memory
operations in conjunction with specified directives to global memory
operations improves throughput of operation for a shared multiprocessor
computing environment.