[Intel-gfx] Guidance on speed for various GL operations with Intel HW

Sun Feb 22 04:35:56 CET 2009

On Thu, 2009-02-19 at 19:36 +0000, Peter Clifton wrote:
> On Thu, 2009-02-19 at 17:42 +0000, Peter Clifton wrote:
> 
> [snip]
> 
> > Is there any model of performance to be expected from the various Intel
> > GPU pipelines? Any operations which are known to trigger fallbacks, or
> > slower rendering? I'm also using XOR operations in GL to draw the
> > crosshair - is there any penalty in that?
> 
> I have 10-15% time shown to be under mesa_clear when running our
> "benchmark" (refreshing as fast as we can), with no graphics primitives
> other than the cursor being shown.
> 
> The framerate caps at 15fps for drawing two XOR lines! (And the glClear
> of the stencil buffer two or three times for each of 10 layers). (This
> was running almost full-screen on a 1920x1200 display.

glClear throughput seems the key here (and it is no better if I emulate
it by setting the appropriate state, and drawing a QUAD onto the
screen). It doesn't make much (if any) difference if I set a bitmask (to
force mesa to use the 3D engine rather than the blitter to do the
clear).

(I was wondering if the throughput issue might have been waiting for
availability of the blitter).

It didn't make any difference if I explicitly asked for a visual with
depth buffer and wiped that at the same time. (I thought it might
potentially be more efficient clearing the whole 32bit wide Depth +
Stencil buffer).

So.. nasty tricks time. I have 8 bits in my stencil buffer. I use 1 bit
for each masking operation (and I have at most two operating
concurrently). I've setup my code to use a different bit for each
successive operation which would have required a clear, then clear a
mask of the unused (but dirty) bit-planes when I run out of pre-cleared
bit-planes.

This yields a decent speedup, even if it does require some extra
complexity. I wish there was a GL_ONES glStencilOp, to complement the
GL_ZERO though.. I suspect I might need to flip all my stencil buffer
logic on its head, as the various update operations aren't particularly
expressive once you've trying to use them as individually masked
bitplanes.

For further optimisation...

Is it worth looking at Intel's VTune? Does it tie in to do GPU
profiling, or am I just wishing..

Anyone know of any tools similar to NVidia perfhud, or the Apple GL
profiling tools to work on Mesa / Linux + Intel GPU hardware?

Best regards,

-- 
Peter Clifton

Electrical Engineering Division,
Engineering Department,
University of Cambridge,
9, JJ Thomson Avenue,
Cambridge
CB3 0FA

Tel: +44 (0)7729 980173 - (No signal in the lab!)