One embodiment of the present includes a heterogenous, high-performance,
scalable processor having at least one W-type sub-processor capable of
processing W bits in parallel, W being an integer value, at least one
N-type sub-processor capable of processing N bits in parallel, N being an
integer value smaller than W by a factor of two. The processor further
includes a shared bus coupling the at least one W-type sub-processor and
at least one N-type sub-processor and memory shared coupled to the at
least one W-type sub-processor and the at least one N-type sub-processor,
wherein the W-type sub-processor rearranges memory to accommodate
execution of applications allowing for fast operations.