[Mesa-dev] shader-db, and justifying an i965 compiler optimization.

Wed May 18 08:05:39 PDT 2011

On Tue, May 17, 2011 at 11:22 PM, Eric Anholt <eric at anholt.net> wrote:
> One of the pain points of working on compiler optimizations has been
> justifying them -- sometimes I come up with something I think is
> useful and spend a day or two on it, but the value doesn't show up as
> fps in the application that suggested the optimization to me.  Then I
> wonder if this transformation of the code is paying off in general,
> and thus if I should push it.  If I don't push it, I end up bringing
> that patch out on every application I look at that it could affect, to
> see if now I finally have justification to get it out of a private
> branch.
>
> At a conference this week, we heard about how another team is are
> using a database of (assembly) shaders, which they run through their
> compiler and count resulting instructions for testing purposes.  This
> sounded like a fun idea, so I threw one together.  Patch #1 is good in
> general (hey, link errors, finally!), but also means that a quick hack
> to glslparsertest makes it link a passing compile shader and therefore
> generate assembly that gets dumped under INTEL_DEBUG=wm.  Patch #2 I
> used for automatic scraping of shaders in every application I could
> find on my system at the time.  The open-source ones I pushed to:
>
> http://cgit.freedesktop.org/~anholt/shader-db
>
> And finally, patch #3 is something I built before but couldn't really
> justify until now.  However, given that it reduced fragment shader
> instructions 0.3% across 831 shaders (affecting 52 of them including
> yofrankie, warsow, norsetto, and gstreamer) and didn't increase
> instructions anywhere, I'm a lot happier now.
>
> Hopefully we hook up EXT_timer_query to apitrace soon so I can do more
> targeted optimizations and need this less :) In the meantime, I hope
> this can prove useful to others -- if you want to contribute
> appropriately-licensed shaders to the database so we track those, or
> if you want to make the analysis work on your hardware backend, feel
> free.
>

I have been thinking at doing somethings slightly different. Sadly
instruction count is not necesarily the best metric to evaluate
optimization performed by shader compiler. Hidding texture fetch
latency of a shader can improve performance a lot more than saving 2
instructions. So my idea was to do a gl app that render into
framebuffer thousand time the same shader. The use of fbo is to avoid
to have things like swapbuffer or a like to play a role while we are
solely interested in shader performance. Also use an fbo as big as
possible so fragment shader has a lot of pixel to go through and i
believe disabling things like blending, zbuffer ... so no other part
of the pipeline impact in anyway the shader.

Others things might play a role, for instance if we provide small
dummy texture we might just hide the gain texture fetch optimization
might give, as the GPU might be able to have the texture in cache and
thus have very low latency on each texture fetch. Same if we are using
same texture for all unit, texture cache might hide latency that real
application might otherwise face. So i think we need to have big
enough dummy texture like 512*512 and different one for each unit,
also try to provide random u,v for texture fetch so that texture cache
doesn't hide too much of the latency.

I am sure i am missing other factor that we should try to diminish
while testing for shader performance.

I think such things isn't a good fit for piglit but it can still be
added as a subtools (so that we don't add yet another repository)

Thanks a lot for extracting all those shader, i am sure we can get
some people to write us shader with some what advance math under
acceptable license.

Cheers,
Jerome