A multiprocessor, parallel computer is made tolerant to hardware failures
by providing extra groups of redundant standby processors and by
designing the system so that these extra groups of processors can be
swapped with any group which experiences a hardware failure. This
swapping can be under software control, thereby permitting the entire
computer to sustain a hardware failure but, after swapping in the standby
processors, to still appear to software as a pristine, fully functioning
system.