A macroscalar processor architecture is described herein. In one
embodiment, a processor receives instructions of a program loop having a
vector block and a sequence block intended to be executed after the
vector block, where the processor includes multiple slices and each of
the slices is capable of executing an instruction of an iteration of the
program loop substantially in parallel. For each iteration of the program
loop, the processor executes an instruction of the sequence block using
one of the slices while executing instructions of the vector block using
a remainder of the slices substantially in parallel. Other methods and
apparatuses are also described.