Methods and an apparatus for stride profiling a software application are
disclosed. An example system uses a hardware performance counter to
report instruction addresses and data addresses associated with memory
access instructions triggered by some event, such as a data cache miss.
When the same instruction address is associated with more than one data
address, the difference between the two data addresses is recorded. When
two or more of these data address differences are recorded for the same
instruction, the system determines a stride associated with the
instruction to be the greatest common divisor of the two or more
differences. This stride may be used by a compiler to optimize data cache
prefetching. In addition, any overhead associated with monitoring
addresses of data cache misses may be reduced by cycling between an
inspection phase and a skipping phase. More data cache misses are
monitored during the inspection phase than during the skipping phase.