A parallel processing architecture comprising a cluster of embedded processors
that share a common code distribution bus. Pages or blocks of code are concurrently
loaded into respective program memories of some or all of these processors (typically
all processors assigned to a particular task) over the code distribution bus, and
are executed in parallel by these processors. A task control processor determines
when all of the processors assigned to a particular task have finished executing
the current code page, and then loads a new code page (e.g., the next sequential
code page within a task) into the program memories of these processors for execution.
The processors within the cluster preferably share a common memory (1 per cluster)
that is used to receive data inputs from, and to provide data outputs to, a higher
level processor. Multiple interconnected clusters may be integrated within a common
integrated circuit device.