One embodiment of the present invention provides a method and an apparatus
that ensures proper semantics for operations when operations are restarted
on a secondary server in the event of a failure of a primary server. This
embodiment keeps a record on the secondary server of which operation
currently has exclusive access to a shared resource. The method operates
by receiving a message from the primary server indicating that a first
operation on the primary server has acquired exclusive access to the
shared resource. In response to this message, the system updates state
information, at the secondary server, to indicate that the first operation
has exclusive access to the shared resource and that any prior operations
have completed their exclusive accesses to the shared resource. Upon
receiving notification that the primary server has failed, the secondary
server is configured to act as a new primary server. When the secondary
server subsequently receives an operation retry request from a client of
the primary server, it performs one of several operations. If the
operation retry request is for the first operation, the system knows the
first operation had exclusive access to the shared resource on the primary
server. In this case, the secondary server acquires exclusive access to
the shared resource, and completes the first operation. If the operation
retry request is for a prior completed operation, the system returns to
the client a saved result of the prior operation if the client has not
received such saved result. Another embodiment of the present invention
includes more that one secondary server.