A system and method for carrying out a two-dimensional forward and/or
inverse discrete cosine transform is disclosed herein. In one embodiment,
the method includes, but is not necessarily limited to: (1) receiving
multiple data blocks; (2) grouping together one respective element from
each of the multiple data blocks to provide full data vectors for
single-instruction-multiple-data (SIMD) floating point instructions; and
(3) operating on the full data vectors with SIMD instructions to carry out
the two dimensional transform on the multiple data blocks. Preferably the
two dimensional transform is carried out by performing a linear transform
on each row of the grouped elements, and then performing a linear
transform on each column of the grouped elements. The method may further
include isolating and arranging the two dimensional transform coefficients
to form transform coefficient blocks that correspond to the originally
received multiple data blocks. The multiple data blocks may consist of
exactly two data blocks. The method may be implemented in the form of
software and conveyed on a digital information storage medium or
information transmission medium. The dual forward or inverse discrete
cosine transform methodology may be employed within a general purpose
computer or within a computation unit of a multimedia encoder or decoder
system, implemented either in hardware or software. A multimedia encoder
or decoder employing the fast, forward or inverse discrete cosine
transform methodology in accordance with the present invention may
advantageously achieve high performance