For use in a wide-issue pipelined processor, a mechanism and method for reducing
pipeline stalls between nested calls and supporting early prefetching of instructions
in nested subroutines and a digital signal processor (DSP) incorporating the mechanism
or the method. In one embodiment, the mechanism includes: (1) a program counter
(PC) generator that generates return PC values for call instructions in a pipeline
of the processor and (2) return PC storage, coupled to the PC generator and located
in an execution core of said processor, that stores the return PC values and makes
ones of the return PC values available to a PC of the processor upon execution
of corresponding return instructions.