Fault tolerant operation is disclosed for a primary instance, such as a
process, thread, application, processor, etc., using an active copy-cat
instance, a.k.a. backup instance, that mirrors operations in the primary
instance, but only after those operations have successfully completed in
the primary instance. Fault tolerant logic monitors inputs and outputs of
the primary instance and gates those inputs to the backup instance once a
given input has been processed. The outputs of the backup instance are
then compared with the outputs of the primary instance to ensure correct
operation. The disclosed embodiments further relate to fault tolerant
failover mechanism allowing the backup instance to take over for the
primary instance in a fault situation wherein the primary and backup
instances are loosely coupled, i.e. they need not be aware that they are
operating in a fault tolerant environment. As such, the primary instance
need not be specifically designed or programmed to interact with the
fault tolerant mechanisms. Instead, the primary instance need only be
designed to adhere to specific basic operating guidelines and shut itself
down when it cannot do so. By externally controlling the ability of the
primary instance to successfully adhere to its operating guidelines, the
fault tolerant mechanisms of the disclosed embodiments can recognize
error conditions and easily failover from the primary instance to the
backup instance.