Methods and apparatus are provided for implementing a programmable device
including a processor core, a hardware accelerator, and secondary
components such as memory. A portion of a program written in a high-level
language is automatically selected for hardware acceleration. Dedicated
ports are generated to allow the hardware accelerator to handle pointer
referencing and dereferencing. A hardware accelerator is generated to
perform pipelined processing of instructions. The number of stages
implemented for pipelined processing is at least partially dependent on
the latency associated with accessing secondary components.