The invention provides, in some aspects, methods and apparatus for signal
and/or image processing which perform convolution-based filtering
utilizing a graphics processing unit (GPU, also called "graphics card")
to compute multiple output pixels at once. This has the advantage of
saving memory bandwidth, while leveraging the GPUs vector multiplication
and dot product units during the calculation. Related aspects of the
invention provide such methods and apparatus in which multiple output
pixels are computed simultaneously by using render targets with more than
one channel, e.g., an RGBA render target, or multiple render targets, or
a combination thereof. By way of non-limiting example, methods and
apparatus according to the invention implement convolution on a GPU by
executing the steps of defining input image I(x,y) as input texture of
size N.sub.x.times.N.sub.y; defining an RGBA render target (output) of
size N.sub.x/4.times.N.sub.y; and, for each RGBA output pixel aggregating
o(x,y) by (i) reading all input pixels I(x*4+i,y), with i=-4,0,4, and
computing o(x,y) for the all four components of the output tuple.