One embodiment of the present invention provides a system that facilitates
locked prefetch scheduling in general cyclic regions of a computer
program. The system operates by first receiving a source code for the
computer program and compiling the source code into intermediate code.
The system then performs a trace detection on the intermediate code.
Next, the system inserts prefetch instructions and corresponding locks
into the intermediate code. Finally, the system generates executable code
from the intermediate code, wherein a lock for a given prefetch
instruction prevents subsequent prefetches from being issued until the
data value returns for the given prefetch instruction.