A prefetching technique referred to as future execution (FE) dynamically
creates a prefetching thread for each active thread in a processor by
simply sending a copy of all committed, register-writing instructions in
a primary thread to an otherwise idle processor. On the way to the second
processor, a value predictor replaces each predictable instruction with a
load immediate instruction, where the immediate is the predicted result
that the instruction is likely to produce during its n.sup.th next
dynamic execution. Executing this modified instruction stream (i.e., the
prefetching thread) in another processor allows computation of the future
results of the instructions that are not directly predictable. This
causes the issuance of prefetches into the shared memory hierarchy,
thereby reducing the primary thread's memory access time and speeding up
the primary thread's execution.