A microarchitecture and instruction set that supports multiple,
simultaneously executing threads. The approach is disclosed in regard to
its applicability in connection with a recently developed
microarchitecture called "WaveScalar." WaveScalar is a compiler that
breaks a control flow graph for a program into pieces called waves having
instructions that are partially ordered (i.e., a wave contains no
back-edges), and for which control enters at a single point. Certain
aspects of the present approach are also generally applicable to
executing multiple threads on a more conventional microarchitecture. In
one aspect of this approach, instructions are provided that enable and
disable wave-ordered memory. Additional memory access instructions bypass
wave-ordered memory, exposing additional parallelism. Also, a
lightweight, interthread synchronization is employed that models hardware
queue locks. Finally, a simple fence instruction is used to allow
applications to handle relaxed memory consistency.