A cluster of computers typically establishes a quorum, i.e., a software
method for establishing agreement, to coordinate access to shared
resources, such as a shared data store, in applications that must recover
from the failure of one or more computers or their associated components.
Prior art which associates a single quorum with an entire cluster, has
inherent overheads that limit the size of the cluster to a small number
of computers. The present invention comprises a scalable, software-based
architecture for implementing a quorum mechanism to coordinate the
actions of a cluster of computers. In contrast to prior art, the present
invention advantageously encapsulates the quorum in a software construct,
called a quorum object, which is disassociated from the cluster as a
whole and spans a designated subset of the cluster's membership. By
employing multiple quorum objects that are distributed across the
cluster's membership, the cluster can uniformly scale to a large number
of computers that handle a scalable processing workload, such as a
partitioned database management system. The software methods that
implement one embodiment of the present invention are described in
detail.