A shared memory parallel processing system interconnected by a multi-stage
network combines new system configuration techniques with special-purpose
hardware to provide remote memory accesses across the network, while
controlling cache coherency efficiently across the network. The system
configuration techniques include a systematic method for partitioning and
controlling the memory in relation to local verses remote accesses and
changeable verses unchangeable data. Most of the special-purpose hardware
is implemented in the memory controller and network adapter, which
implements three send FIFOs and three receive FIFOs at each node to
segregate and handle efficiently invalidate functions, remote stores, and
remote accesses requiring cache coherency. The segregation of these three
functions into different send and receive FIFOs greatly facilitates the
cache coherency function over the network. In addition, the network itself
is tailored to provide the best efficiency for remote accesses.