[Bug 93681] GLSL compilation can be very slow

Tue Jan 12 11:00:19 PST 2016

https://bugs.freedesktop.org/show_bug.cgi?id=93681

--- Comment #1 from Matt Turner <mattst88 at gmail.com> ---
(In reply to Marc-Andre Lureau from comment #0)
> Created attachment 120988 [details]
> slow.shader_test
> 
> It seems the glsl compiler isn't that great wrt performance.

Can we please avoid qualitative judgments like this? Especially without
understanding what the problem is, such statements are needlessly pejorative.

> Please find
> attached a reasonably sized shader that takes >1m30s to compile:

I think we have very different definitions of what constitutes reasonably
sized.

> 
> time piglit/bin/shader_runner slow.shader_test -auto
> PIGLIT: {"result": "pass" }
> 
> real	1m47.913s
> user	1m47.676s
> sys	0m0.239s
> 
> 
> Similar bug is 87103

The problem (I believe) is that the the shader uses a single large array

   vec4 temps[295];

to hold its temporary calculations (even though they are independent values),
but then indirectly addresses that array, forcing the whole array to be kept
alive. Moreover, we generate binary-search if ladders for the indirect
accesses, blowing up the number of instructions and basic blocks. 

Keeping 295 vec4s alive of course leads to spilling, which leads to a cascade
of slowdowns ultimately resulting in a massive and disgusting shader:

15530 instructions. 8 loops. 44798978 cycles. 296:3701 spills:fills. 3248 basic
blocks.

It's not clear to me if in all cases it's statically determinable whether only
a subset of the array can be indirectly accessed, as is the case here:

        temps[44].x = float((max( temps[43].xxxx , (vec4(0,0,0,0)))));
        temps[45].x = float((min( temps[44].xxxx , (vec4(5,5,5,5)))));
        temps[46].x = float(intBitsToFloat(ivec4( temps[45].xxxx )));
        addr0 = int(floatBitsToInt(temps[46].xxxx));
        temps[47].x = float(( temps[addr0 + 9].wwww .x));

... but I somewhat doubt it, because of the 8 multiply-nested loops.

Improving our handling of indirect addressing (i.e., not generating the if
ladders) would improve compile time and generated code quality, but I'm not
sure if it would eliminate spilling.

A range analysis pass feeding into an alias analysis might be able to decipher
what's going on with temps[295], but again it's not clear to me from a cursory
glance whether the indirect accesses are statically determinable to occur in a
particular range of the array. If it was, that would drastically improve this
shader's compile time and generated code. Both of those passes are lots of
work.

Since I cannot imagine a human writing such a shader I have to assume that it
was generated programmatically. Can we improve the program generating the GLSL
shader to create a more transparent shader? Namely, splitting the huge
indirectly-addressed array into independently accessed components.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20160112/f3fd5262/attachment-0001.html>