In one embodiment, a processor comprises a plurality of processor cores
and an interconnect to which the plurality of processor cores are
coupled. Each of the plurality of processor cores comprises at least one
translation lookaside buffer (TLB). A first processor core is configured
to broadcast a demap command on the interconnect responsive to executing
a demap operation. The demap command identifies one or more translations
to be invalidated in the TLBs, and remaining processor cores are
configured to invalidate the translations in the respective TLBs. The
remaining processor cores transmit a response to the first processor
core, and the first processor core is configured to delay continued
processing subsequent to the demap operation until the responses are
received from each of the remaining processor cores.