An architecture and coherency protocol for use in a large SMP computer
system includes a hierarchical switch structure which allows for a number
of multi-processor nodes to be coupled to the switch to operate at an
optimum performance. Within each multi-processor node, a simultaneous
buffering system is provided that allows all of the processors of the
multi-processor node to operate at peak performance. A memory is shared
among the nodes, with a portion of the memory resident at each of the
multi-processor nodes. Each of the multi-processor nodes includes a number
of elements for maintaining memory coherency, including a victim cache, a
directory and a transaction tracking table. The victim cache allows for
selective updates of victim data destined for memory stored at a remote
multi-processing node, thereby improving the overall performance of
memory. Memory performance is additionally improved by including, at each
memory, a delayed write buffer which is used in conjunction with the
directory to identify victims that are to be written to memory. An arb bus
coupled to the output of the directory of each node provides a central
ordering point for all messages that are transferred through the SMP. The
messages comprise a number of transactions, and each transaction is
assigned to a number of different virtual channels, depending upon the
processing stage of the message. The use of virtual channels thus helps to
maintain data coherency by providing a straightforward method for
maintaining system order. Using the virtual channels and the directory
structure, cache coherency problems that would previously result in
deadlock may be avoided.