A processor is provided that includes an execution unit for executing
instructions and a replay system for replaying instructions which have
not executed properly. The replay system is coupled to the execution unit
and includes a checker for determining whether each instruction has
executed properly and a plurality of replay queues or replay queue
sections coupled to the checker for temporarily storing one or more
instructions for replay. In one embodiment, thread-specific replay queue
sections may each be used to store a long latency instruction for each
thread until the long latency instruction is ready to be executed (e.g.,
data for a load instruction has been retrieved from external memory). By
storing the long latency instruction and its dependents in a replay queue
section for one thread which has stalled, execution resources are made
available for improving the speed of execution of other threads which
have not stalled.