A computer system includes a processor capable of executing a plurality of
N threads of instructions, N being an integer greater than one, with a
set of global registers visible to each of the plurality of threads and a
plurality of busy bit memory elements used to signal whether or not a
register is in use by a thread. The processor includes logic to stall a
read from global register if the thread reading the global register is a
speculative thread and the busy bits for prior threads are set. The
processor might also include a speculative load address memory, into
which speculative loads from speculative threads are entered and logic to
compare addresses for stores from nonspeculative threads with addressees
in the speculative load address memory and invalidate speculative threads
corresponding to the speculative load addresses stored in the speculative
load address memory. In an efficient implementation, aliasing load
instructions can be distinct from nonaliasing load instructions, whereby
addresses of aliasing load instructions are selectively stored in the
speculative load address memory.