The present invention relates to a method and system for efficiently
identifying errant processes in a computer system using an operating
system (OS) error recovery method that identifies if the error caused by
the errant process can be recovered and, if so, can recover from the
error. The method and system of the present invention operates after
standard Error Correcting Code (ECC) and parity check bit methods and
systems are unsuccessful in recovering from the error. In accordance with
an embodiment of the present invention, the method and system includes
detecting an error during instruction execution, storing a physical
address of an errant process that caused the error, and storing an
execution instruction pointer (IP) in a processor including at least one
critical memory structure to detect an error and a processor error
processing logic hardware coupled to the at least one critical memory
structure. The processor error processing logic hardware to store a
physical address of an errant process that caused the error, store an
execution instruction pointer (IP) in an interruption instruction pointer
(IIP), determine a first virtual address from an operating system mapping
table, determine a second virtual address from a translation look-aside
buffer, and identify the errant process, if the physical address and the
second virtual address match the physical address and the first virtual
address.