A hardware looping mechanism and method is described herein for handling
any number and/or type of discontinuity instruction that may arise when
executing program instructions within a scalar or superscalar processor.
For example, the hardware looping mechanism may provide zero-overhead
looping for branch instructions, in addition to single loop constructs
and multiple loop constructs (which may or may not be nested).
Zero-overhead looping may also be provided in special cases, e.g., when
servicing an interrupt or executing a branch-out-of-loop instruction. In
addition to reducing the number of instructions required to execute a
program, as well as the overall time and power consumed during program
execution, the hardware looping mechanism described herein may be
integrated within any processor architecture without modifying existing
program code.