A microprocessor with multiple stream prefetch engines each executing a
stream prefetch instruction to prefetch a complex data stream specified
by the instruction in a manner synchronized with program execution of
loads from the stream is provided. The stream prefetch engine stays at
least a fetch-ahead distance (specified in the instruction) ahead of the
program loads, which may randomly access the stream. The instruction
specifies a level in the cache hierarchy to prefetch into, a locality
indicator to specify the urgency and ephemerality of the stream, a stream
prefetch priority, a TLB miss policy, a page fault miss policy, a
protection violation policy, and a hysteresis value, specifying a minimum
number of bytes to prefetch when the stream prefetch engine resumes
prefetching. The memory subsystem includes a separate TLB for stream
prefetches; or a joint TLB backing the stream prefetch TLB and load/store
TLB; or a separate TLB for each prefetch engine.