A system for effecting recovery of a network involving a plurality of computing
apparatuses with each respective computing apparatus hosting at least one respective
service, includes: (a) at least one control unit substantially embodied in hardware
and coupled with each computing apparatus; and (b) at least one control program
substantially embodied in software and distributed among at least one of the computing
apparatuses. The system responds to a computing apparatus becoming inoperative
by effecting a recovery operation. The recovery operation includes distributing
the services hosted by the inoperative computing apparatus as distributed services
among operating computing apparatuses and returning the distributed services to
the inoperative computing apparatus after the inoperative computing apparatus becomes
operative. The at least one control unit and the at least one control program cooperate
to effect the recovery operation.