A method to efficiently pre-fetch and batch compiler-assisted software
cache accesses is provided. The method reduces the overhead associated
with software cache directory accesses. With the method, the local memory
address of the cache line that stores the pre-fetched data is itself
cached, such as in a register or well known location in local memory, so
that a later data access does not need to perform address translation and
software cache operations and can instead access the data directly from
the software cache using the cached local memory address. This saves
processor cycles that would otherwise be required to perform the address
translation a second time when the data is to be used. Moreover, the
system and method directly enable software cache accesses to be
effectively decoupled from address translation in order to increase the
overlap between computation and communication.