One embodiment of the present invention sets forth an architecture for
optimizing graphics rendering efficiency by advancing the Z-test
operation prior to pixel shading whenever possible. The current rendering
state, as maintained by the setup engine, determines whether advancing
the Z-test function above the shader engine for "early" Z-testing is
possible or whether the Z-test function should be deferred until after
shading operations for "late" Z-testing. Data is dynamically routed to
each processing engine in the pipeline, so that the appropriate data flow
for either early Z or late Z is dynamically constructed, as determined by
the current rendering state. Efficiency is gained by relieving the shader
engine of unnecessary work whenever possible by discarding pixels before
they enter the shader engine. The same functional units are utilized in
both early Z and late Z configurations, minimizing any additional
hardware required for implementation.