A mechanism for minimizing effective memory latency without unnecessary
cost through fine-grained software-directed data prefetching using
integrated high-level and low-level code analysis and optimizations is
provided. The mechanism identifies and classifies streams, identifies
data that is most likely to incur a cache miss, exploits effective
hardware prefetching to determine the proper number of streams to be
prefetched, exploits effective data prefetching on different types of
streams in order to eliminate redundant prefetching and avoid cache
pollution, and uses high-level transformations with integrated lower
level cost analysis in the instruction scheduler to schedule prefetch
instructions effectively.