Software-based cache coherence protocol. A processing unit may execute a
memory request using a processor thread. In response to detecting a cache
hit to shared or a cache miss associated with the memory request, a cache
may provide both a trap signal and coherence information to the processor
thread of the processing unit. After receiving the trap signal and the
coherence information, the processor thread may perform a cache coherence
operation for the memory request using at least the received coherence
information. The processing unit may include a plurality of processor
threads and a load balancer. The load balancer may receive coherence
requests from one or more remote processing units and distribute the
received coherence requests across the plurality of processor threads.
The load balance may preferentially distribute the received coherence
requests across the plurality of processor threads based on the operation
state of the processor threads.