A processor having a reduction unit that sums m input operands plus an
accumulator value, with the option of saturating after each addition or
wrapping around the result of each addition. The reduction unit also
allows the m input operands to be subtracted from the accumulator value
by simply inverting the bits of the input operands and setting a carry
into each of a plurality of reduction adders to one. The reduction unit
can be used in conjunction with m parallel multipliers to quickly perform
dot products and other vector operations with either saturating or
wrap-around arithmetic.