A mechanism and method for maintaining a consistent state in a
non-volatile random access memory system without constraining normal
computer operation is provided, thereby enabling a computer system to
recover from faults, power loss, or other computer system failure without
a loss of data or processing continuity. In a typical computer system,
checkpointing data is either very slow, very inefficient or would not
survive a power failure. In embodiments of the present invention, a
non-volatile random access memory system is used to capture checkpointed
data, and can later be used to rollback the computer system to a previous
checkpoint. This structure and protocol can efficiently and quickly
enable a computer system to recover from faults, power loss, or other
computer system failure.