An method, apparatus and article of manufacture for detecting and
correcting memory device failures includig detecting errors in data stored
in a memory device from the data transacted with a processor, correcting
the detected errors in the data transacted with the processor, tracking
the detected errors in the memory device, determining when the memory
device has failed based upon the tracked detected errors and resetting the
memory device when the memory device fails testing, and further,
identifying erroneous latch-ups detected soon after powering and
correcting errors such that no erroneous data is transacted with the
processor.