The discrete cosine transform (DCT) is mapped to a graphics processing
unit (GPU) instead of a central processing unit (CPU). The DCT can be
implemented using a shader-based process or a host-based process. A
matrix is applied to a set of pixel samples. The samples are processed in
either rows or columns first, and then the processing is performed in the
opposite direction. The number of times a shader program is changed is
minimized by processing all samples that use a particular shader (e.g.,
the first shader) at the same time (e.g., in sequence).