A computer system includes a primary processor and a secondary processor running
in lockstep. The lockstep may or may not be synchronous. Errors occurring in the
primary processor or the secondary processor are reported to an error-handling
module. If the error is a recoverable error, the state of one of the processors
is saved and the processors are restarted using the saved state. In addition to
the reporting of errors from the processors, cross checking of the operation of
the processors is performed to detect a divergence in the operation of the processors.
If the divergence is reported to be due to a recoverable error, the state of the
one of the processors is saved and the processors are restarted using the saved
state. Procedures are also disclosed to ensure that data corruption does not propagate
onto an associated network, and to ensure that the system is not lost as a network
resource during processor restart.