Improved techniques are provided for detecting and correcting errors and
skew in inter-cluster communications within computer systems having a
plurality of multi-processor clusters. The local nodes of each cluster
include a plurality of processors and an interconnection controller.
Intra-cluster links are formed between the local nodes, including the
interconnection controller, within a cluster. Inter-cluster links are
formed between interconnection controllers of different clusters.
Intra-cluster packets may be serialized and encapsulated as inter-cluster
packets for transmission on inter-cluster links, preferably with
link-layer encapsulation. Each inter-cluster packet may include a
sequence identifier and error information computed for that packet. Clock
data may be embedded in symbols sent on each bit lane of the
inter-cluster links. Copies of transmitted inter-cluster packets may be
stored until an acknowledgement is received. The use of inter-cluster
packets on an inter-cluster link is preferably transparent to other links
and to the protocol layer.