A failover algorithm implemented in software, without any failover-specific
hardware, that allows servers in a cluster to determine whether a primary
or secondary controller is active without requiring communication between
the primary and secondary controllers. A server cluster includes several
servers coupled to two servers, which are designated as a primary
controller and a secondary controller. While the server cluster is
operational, either the primary controller or the secondary controller
will be actively controlling the cluster. Software running on the servers
of the cluster, on the primary controller, and on the secondary
controller, cooperates to ensure that each server will properly identify
which controller is active at any particular time, including, but not
limited to, upon starting up the server cluster, upon adding one or more
servers to a cluster that is already operation, and upon failure of an
active controller, a server, or a link between an active controller and a
server. The failover algorithm includes the following steps performed by
each of a group of servers in the cluster for identifying which controller
is active: making the server's own assessment of the active controller;
and identifying either the primary controller or the secondary controller
as a consensus active controller based upon a majority vote of the own
assessments by each server in the cluster as to which controller is the
active controller.