One embodiment of the present invention provides a system that facilitates
self-correcting memory in a shared-memory system. The system includes a
main memory coupled to a memory controller for reading and writing memory
locations and for marking memory locations that have been checked out to a
cache. The system also includes a processor cache for storing data
currently in use by a central processing unit. A communication channel is
coupled between the processor cache and the memory controller to
facilitate communication. The memory controller includes an error
detection and correction mechanism and also includes a mechanism for
reading data from the processor cache when a currently valid copy of the
data is checked out to the processor cache. When the data is returned to
the memory subsystem from the cache, the error detection and correction
mechanism corrects errors and stores a corrected copy of the data in the
main memory.