A multi-threaded embedded processor that includes an on-chip deterministic
(e.g., scratch or locked cache) memory that persistently stores all
instructions associated with one or more pre-selected high-use threads.
The processor executes general (non-selected) threads by reading
instructions from an inexpensive external memory, e.g., by way of an
on-chip standard cache memory, or using other potentially slow,
non-deterministic operation such as direct execution from that external
memory that can cause the processor to stall while waiting for
instructions to arrive. When a cache miss or other blocking event occurs
during execution of a general thread, the processor switches to the
pre-selected thread, whose execution with zero or minimal delay is
guaranteed by the deterministic memory, thereby utilizing otherwise
wasted processor cycles until the blocking event is complete.