An apparatus for and method of enhancing reliability within a cluster lock
processing system having a relatively large number of commodity cluster
instruction processors which are managed by a cluster lock manager.
Because the commodity processors have virtually no system viability
features such as memory protection, failure recovery, etc., the
cluster/lock processors assume the responsibility for providing these
functions. The low cost of the commodity cluster instruction processors
makes the system almost linearly scalable. The cluster/locking, caching,
and mass storage accessing functions are fully integrated into a single
hardware platform which performs the role of the master. Upon failure of
this hardware platform, a second redundant hardware platform converts
from slave to master role. The logic for the failure detection and role
swapping is placed within software, which can run as an application under
a commonly available operating system. Furthermore, the recovery is
completely accomplished without assistance of the Host computer(s) or
ultimate user(s) coupled via the Host computer(s).