A mechanism to process units of data associated with a dependent data
stream using different threads of execution and a common data structure
in memory. Accessing the common data structure in memory for the
processing uses a single read operation and a single write operation. The
folding of multiple read-modify-write memory operations in such a manner
for multiple multi-threaded stages of processing includes controlling a
first stage, which operates on the same data unit as a second stage to
pass context state information to the second stage for coherency.