The present invention provides a framework for managing both clustering
and data replication in a software system distributed across multiple
nodes. The framework includes at least one agent running at nodes
comprising the distributed system. The framework also includes a master
to coordinate clustering and replication operations. The framework
further includes a library of software programs, called primitives, that
are used by agents to communicate with the master. The agent(s) obtain
cluster status information and replication status information, which are
used by the master to manage clustering and replication operations. The
framework is designed to work with existing cluster management
applications and data replication facilities. The framework provides
status information needed for coordinating clustering and replication
operations to ensure that applications and data remain in a consistent
state for disaster recovery purposes.