Apparatus and methods for autonomously identifying and mitigating
soft-errors affecting integrated circuit memory storage devices are
provided. A soft-error mitigation process is invoked upon finding that an
integrated circuit memory device is affected by a parity error. In a
staged approach, unused memory regions of the integrated circuit memory
device are reinitialized; if a redundant deployment prevails, the
subsystem corresponding to the affected integrated circuit memory device
is reset; memory regions having copies of contents thereof stored at
remote locations are rewritten with obtained copies of the contents; and
memory regions storing contents which are generated at run-time are
reinitialized. Directed parity error scans are employed at each stage. If
the parity error persists, one of the apparatus, and the subsystem
corresponding to the affected silicon memory device is reset during a
maintenance window. Advantages are derived from a run-time soft-error
mitigation process which increases availability, and reduces maintenance
overheads and the need for hardware replacement.