A method for managing HPC node failure includes determining that one of a
plurality of HPC nodes has failed, with each HPC node comprising an
integrated fabric. The failed node is then removed from a virtual list of
HPC nodes, with the virtual list comprising one logical entry for each of
the plurality of HPC nodes.