A novel massively parallel supercomputer of hundreds of teraOPS-scale
includes node architectures based upon System-On-a-Chip technology, i.e.,
each processing node comprises a single Application Specific Integrated
Circuit (ASIC). Within each ASIC node is a plurality of processing
elements each of which consists of a central processing unit (CPU) and
plurality of floating point processors to enable optimal balance of
computational performance, packaging density, low cost, and power and
cooling requirements. The plurality of processors within a single node
may be used individually or simultaneously to work on any combination of
computation or communication as required by the particular algorithm
being solved or executed at any point in time. The system-on-a-chip ASIC
nodes are interconnected by multiple independent networks that optimally
maximizes packet communications throughput and minimizes latency. In the
preferred embodiment, the multiple networks include three high-speed
networks for parallel algorithm message passing including a Torus, Global
Tree, and a Global Asynchronous network that provides global barrier and
notification functions. These multiple independent networks may be
collaboratively or independently utilized according to the needs or
phases of an algorithm for optimizing algorithm processing performance.
For particular classes of parallel algorithms, or parts of parallel
calculations, this architecture exhibits exceptional computational
performance, and may be enabled to perform calculations for new classes
of parallel algorithms. Additional networks are provided for external
connectivity and used for Input/Output, System Management and
Configuration, and Debug and Monitoring functions. Special node packaging
techniques implementing midplane and other hardware devices facilitates
partitioning of the supercomputer in multiple networks for optimizing
supercomputing resources.