[Mesa-dev] Compiling of shader gets stuck in infinite loop

Thu Sep 11 01:02:49 PDT 2014

Hi,

I have been looking into this bug:

Compiling of shader gets stuck in infinite loop
https://bugs.freedesktop.org/show_bug.cgi?id=78468

Although this occurs at link time when the Intel driver has run some of
its specific lowering passes, it looks like the problem could hit other
drivers if the right conditions are met, as the actual problem happens
inside common optimization passes.

I reproduced the problem with a very simple shader like this:

uniform sampler2D tex;
out vec4 FragColor;
void main()
{
   vec4 col = texture(tex, vec2(0, 0));
   for (int i=0; i<30; i++)
      col += vec4(0.1, 0.1, 0.1, 0.1);
   col = vec4(col.rgb / 2.0, col.a);
   FragColor = col;
}

and for this shader, I traced the problem down to the fact that
do_tree_grafting() is generating instructions like this:

(assign  (x) (var_ref flattening_tmp_y at 116)  (expression float * (swiz x
(expression float + (swiz x (expression float + (swiz x (expression
float + (swiz x (expression float + (swiz x (expression float + (swiz x
(expression float + (swiz x (expression float + (swiz x (expression
float + (swiz x (expression float + (swiz x (expression float + (swiz x
(expression float + (swiz x (expression float + (swiz x (expression
float + (swiz x (expression float + (swiz x (expression float + (swiz x
(expression float + (swiz x (expression float + (swiz x (expression
float + (swiz x (expression float + (swiz x (expression float + (swiz x
(expression float + (swiz x (expression float + (swiz x (expression
float + (swiz x (expression float + (swiz x (expression float + (swiz x
(expression float + (swiz x (expression float + (swiz x (expression
float + (swiz x (expression float + (var_ref col_y) (constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.100000)) ) )(constant float
(0.100000)) ) )(constant float (0.500000)) ) ) 

And when we feed these to do_constant_folding() it takes forever to
finish. For this shader in particular, removing the tree grafting pass
from do_common_optimization eliminates the problem.

Notice that small, seemingly irrelevant changes to the shader code, can
make it so that this never happens. For example, if we initialize 'col'
to something like vec4(0,0,0,0) instead of using the texture function,
or we remove the division by 2.0 in the last assignment to 'col', these
instructions are never produced and the shader compiles okay.

The number of iterations in the loop is also important, if we have too
many we do not unroll the loop and the problem never happens, if we have
too few, rather than generating a super large tree of expressions like
above, we generate something like this and the problem, again, does not
happen: (notice how it adds 0.1 nine times to make 0.9 rather than
chaining 9 add expressions for 10 iterations of the loop):

(assign  (x) (var_ref flattening_tmp_y)  (expression float * (expression
float + (constant float (0.900000)) (var_ref col_y) ) (constant float
(0.500000)) ) )

So it seems that whether we generate a huge chunk of expressions or not
is subject to a number of factors, but when the right conditions are met
we can generate code that can stall compilation forever.

Reading what tree grafting is supposed to do, this does not seem to be
an unexpected result though, so I wonder what would be the right way to
fix this. It would look like we would want to do whatever we are doing
when we only have a few iterations in the loop, but I don't know why we
generate different code in that case and I am not familiar enough with
all the optimization and lowering passes to assess what would make sense
to do here... so, any suggestions?

Iago