[Mesa-dev] Register spilling issues in the NIR->vec4 backend

Wed Jul 15 07:49:01 PDT 2015

Hi,

when we sent the patches for the new nir->vec4 backend we mentioned that
we had a few dEQP tests that would fail to link because of register
spilling. Now that we have added GS support we see a few instances of
this problem popping up in a few GS piglit tests too, for example this
one:

tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd.shader_test

I have been looking into what is going on with these tests and I came to
the conclusion that the problem is a consequence of various factors, but
probably the main thing contributing to it is the way our SSA pass
works. That said, I am not that experienced with NIR, so it could also
be that my analysis is missing something and I am just arriving to wrong
conclusions, so I'll explain my thoughts below and hopefully someone
else with more NIR experience can jump in and confirm or reject my
analysis.

The GS code in that test looks like this:

for (int p = 0; p < 3; p++) {
   color = ((index >= ins[p].m1.length() ?  
            ins[p].m2[index-ins[p].m1.length()] :
            ins[p].m1[index]) == expect) ?
               vec4(0.0, 1.0, 0.0, 1.0) : vec4(1.0, 0.0, 0.0, 1.0);
   gl_Position = gl_in[p].gl_Position;
   EmitVertex();
}

One thing that is immediately contributing to the register pressure is
some really awful code generated because of the indirect array indexing
on the inputs inside the loop. This is because of the
lower_variable_index_to_cond_assign lowering pass called from
brw_shader.cpp. This pass will convert that color assignment into a
bunch of nested if/else statements which makes the generated GLSL IR
code rather large, involving plenty of temporaries too. This is only
made worse by the fact that loop unrolling will replicate that 3 times.
The result is a huge pile of GLSL IR with a few dozens of nested if/else
statements and temporaries that looks like [1] (that is only a fragment
of the GLSL IR).

One thing that is particularly relevant in that code is that it has
multiple conditional assignments to the same variable
(dereference_array_value) as a consequence of this lowering pass.

That much, however, is common to the NIR and non-NIR paths. The problem
in the NIR case is that all these assignments generate new SSA values,
which then become new registers in the final NIR form. This leads to NIR
code like [2].  In contrast, the old vec4 visitor path, is able to have
writes to the same variable write to the same register.

As a result, if I print the code right before register allocation in the
NIR path [3] and I compare that to what we get with the old vec4 visitor
path at that same point [4], it is clearly visible that this difference
is allowing the vec4 visitor path to reduce register pressure (see how
in [4] we have multiple writes to vgrf5, while in [3] we always write to
a new vgrf every time).

So, am I missing something or is this kind of result expected with NIR
programs? Is there anything in the nir->vec4 pass that we can do to fix
this or does this need to be fixed when going out of SSA moe inside NIR?

Iago

[1] http://pastebin.com/5uA8ex2S
[2] http://pastebin.com/pqLfvAVN
[3] http://pastebin.com/64nSuUH8
[4] http://pastebin.com/WCrdYxzt