One embodiment of the present invention provides a system that efficiently
marks cache lines in a multi-processor computer system. The system starts
by receiving a load request for a cache line from a requesting thread.
Upon receiving the load request, the system loads a copy of the cache
line into a local cache for the requesting thread. The system then
load-marks the copy of the cache line in the local cache by incrementing
a reader count value contained in metadata for the copy of the cache
line, regardless of the cache coherency protocol status of the copy of
the cache line, whereby the system updates the metadata in the local copy
of the cache line without obtaining exclusive access to the cache line.