A fiber channel storage area network (SAN) provides virtualized storage space
for
a number of servers to a number of virtual disks implemented on various virtual
redundant array of inexpensive disks (RAID) devices striped across a plurality
of physical disk drives. The SAN includes plural controllers and communication
paths to allow for fail-safe and fail-over operation. The plural controllers can
be loosely-coupled to provide n-way redundancy and have more than one independent
channel for communicating with one another. In the event of a failure involving
a controller or controller interface, the virtual disks that are accessed via the
affected interfaces are re-mapped to another interface in order to continue to
provide high data availability. In particular, deadman timers, heartbeat signals
internal to each controller, and heartbeat signals between different controllers
are used to detect controllers that are no longer communicating with other controllers
in order to identify those controllers which are failing or have failed.