Method/system is disclosed for recovering computing capacity and critical
applications after a catastrophic failure. The method/system involves
distributing the computing capacity over multiple computing clusters,
each computing cluster having concurrent access to shared data and
software applications of other computing clusters. Sufficient backup
computing capacity is reserved on each computing cluster to recover some
or all active computing capacity on the other computing clusters. Message
traffic throughout the computing clusters is monitored for indications of
a catastrophic failure. Upon confirmation of a catastrophic failure at
one computing cluster, the workloads of that computing cluster are
transferred to the backup computing capacity of the other computing
clusters. Software applications that have been designated for recovery
are then brought up on the backup computing capacity of the other
computing clusters. Such an arrangement allows computing capacity and
critical software applications to be quickly recovered after a
catastrophic failure.