Systems and apparatuses are presented relating a programmable processor
comprising an execution unit that is operable to decode and execute
instructions received from an instruction path and partition data stored
in registers in the register file into multiple data elements, the
execution unit capable of executing a plurality of different group
floating-point and group integer arithmetic operations that each
arithmetically operates on multiple data elements stored registers in a
register file to produce a catenated result that is returned to a
register in the register file, wherein the catenated result comprises a
plurality of individual results, wherein the execution unit is capable of
executing group data handling operations that re-arrange data elements in
different ways in response to data handling instructions.