A method and apparatus for including in a processor instructions for
performing multiply-add operations on packed byte data. In one
embodiment, a processor is coupled to a memory. The memory has stored
therein a first packed byte data and a second packed byte data. The
processor performs operations on data elements in said first packed byte
data and said second packed byte data to generate a third packed data in
response to receiving an instruction. A plurality of the data elements in
this third packed data storing the result of performing multiply-add
operations on data elements in the first and second packed byte data.