A method of calculating utilization and bottleneck performance parameters
of a processing unit within a graphical processing unit (GPU). The
utilization is a measure of a percentage that the processing unit is
utilized over a draw call execution time. The bottleneck is the sum of
the time period that the processing unit is active, the time period that
the processing unit is full and does not accept data from an upstream
processing unit, minus the time period that the processing unit is paused
because the downstream processing unit is busy and cannot accept data,
all over the execution time of the draw call. Performance parameters may
be determined by sampling the processing unit and incrementing a counter
when a condition is true. The method is repeated for the same draw call,
for each processing unit of the GPU, and for a plurality of draw calls
comprising a frame.