A method and mechanisms for checkpointing objects, processes and other
components of a multithreaded application program, based on the
leader-follower strategy of semi-active or passive replication, where it
is not possible to stop and checkpoint all of the threads of the object,
process or other component simultaneously. Separate checkpoints are
generated for the local state of each thread and for the data that are
shared between threads and are protected by mutexes. The invention
enables different threads to be checkpointed at different times in such a
way that the checkpoints restore a consistent state of the threads
between the existing replicas and a new or recovering replica, even
though the threads operate concurrently and asynchronously. The
checkpoint of the shared data is piggybacked onto regular messages along
with ordering information that determines the order in which the mutexes
are granted to the threads.