Methods for optimizing memory unit usage to maximize packet throughput for
a multiprocessor multithreaded architecture. One method employs a first
phase of a software algorithm to allocate data structures to memory units
in which the data units are stored and accessed during processing
operations. The allocation is such that the data structures are allocated
to memory units having lower latencies while satisfying capacity and
bandwidth constraints for the memory units. A second phase of the
algorithm may be employed to tune the allocation, wherein the performance
level of an initial allocation and subsequent reallocations are simulated
for an environment in which the memory units and data structures are to
be implemented. From the simulation, the allocation providing the best
performance level is selected. The simulated environment may include
network processor unit (NPU) environments, with the performance level
comprising a measure of packet throughput.