Methods and systems for performing arithmetic functions. In accordance
with a first aspect of the invention, methods and apparatus are provided,
working in conjunction of software algorithms and hardware implementation
of class network routing, to achieve a very significant reduction in the
time required for global arithmetic operation on the torus. Therefore, it
leads to greater scalability of applications running on large parallel
machines. The invention involves three steps in improving the efficiency
and accuracy of global operations: (1) Ensuring, when necessary, that all
the nodes do the global operation on the data in the same order and so
obtain a unique answer, independent of roundoff error; (2) Using the
topology of the torus to minimize the number of hops and the
bidirectional capabilities of the network to reduce the number of time
steps in the data transfer operation to an absolute minimum; and (3)
Using class function routing to reduce latency in the data transfer. With
the method of this invention, every single element is injected into the
network only once and it will be stored and forwarded without any further
software overhead. In accordance with a second aspect of the invention,
methods and systems are provided to efficiently implement global
arithmetic operations on a network that supports the global combining
operations. The latency of doing such global operations are greatly
reduced by using these methods.