A method and apparatus for a coherence mechanism that supports a
distributed memory programming model in which processors each maintain
their own memory area, and communicate data between them. A hierarchical
programming model is supported, which uses distributed memory semantics
on top of shared memory nodes. Coherence is maintained globally, but
caching is restricted to a local region of the machine (a "node" or
"caching domain"). A directory cache is held in an on-chip cache and is
multi-banked, allowing very high transaction throughput. Directory
associativity allows the directory cache to map contents of all caches
concurrently. References off node are converted to non-allocating
references, allowing the same access mechanism (a regular load or store)
to be used for both for intra-node and extra-node references. Stores
(Puts) to remote caches automatically update the caches instead of
invalidating the caches, allowing producer/consumer data sharing to occur
through cache instead of through main memory.