[Mesa-dev] Register spilling issues in the NIR->vec4 backend

Thu Jul 16 08:15:27 PDT 2015

On Jul 15, 2015 11:20 PM, "Iago Toral" <itoral at igalia.com> wrote:
>
> On Wed, 2015-07-15 at 11:02 -0700, Connor Abbott wrote:
> > On Wed, Jul 15, 2015 at 7:49 AM, Iago Toral <itoral at igalia.com> wrote:
> > > Hi,
> > >
> > > when we sent the patches for the new nir->vec4 backend we mentioned
that
> > > we had a few dEQP tests that would fail to link because of register
> > > spilling. Now that we have added GS support we see a few instances of
> > > this problem popping up in a few GS piglit tests too, for example this
> > > one:
> > >
> > >
tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd.shader_test
> > >
> > > I have been looking into what is going on with these tests and I came
to
> > > the conclusion that the problem is a consequence of various factors,
but
> > > probably the main thing contributing to it is the way our SSA pass
> > > works. That said, I am not that experienced with NIR, so it could also
> > > be that my analysis is missing something and I am just arriving to
wrong
> > > conclusions, so I'll explain my thoughts below and hopefully someone
> > > else with more NIR experience can jump in and confirm or reject my
> > > analysis.
> > >
> > > The GS code in that test looks like this:
> > >
> > > for (int p = 0; p < 3; p++) {
> > >    color = ((index >= ins[p].m1.length() ?
> > >             ins[p].m2[index-ins[p].m1.length()] :
> > >             ins[p].m1[index]) == expect) ?
> > >                vec4(0.0, 1.0, 0.0, 1.0) : vec4(1.0, 0.0, 0.0, 1.0);
> > >    gl_Position = gl_in[p].gl_Position;
> > >    EmitVertex();
> > > }
> > >
> > > One thing that is immediately contributing to the register pressure is
> > > some really awful code generated because of the indirect array
indexing
> > > on the inputs inside the loop. This is because of the
> > > lower_variable_index_to_cond_assign lowering pass called from
> > > brw_shader.cpp. This pass will convert that color assignment into a
> > > bunch of nested if/else statements which makes the generated GLSL IR
> > > code rather large, involving plenty of temporaries too. This is only
> > > made worse by the fact that loop unrolling will replicate that 3
times.
> > > The result is a huge pile of GLSL IR with a few dozens of nested
if/else
> > > statements and temporaries that looks like [1] (that is only a
fragment
> > > of the GLSL IR).
> > >
> > > One thing that is particularly relevant in that code is that it has
> > > multiple conditional assignments to the same variable
> > > (dereference_array_value) as a consequence of this lowering pass.
> > >
> > > That much, however, is common to the NIR and non-NIR paths. The
problem
> > > in the NIR case is that all these assignments generate new SSA values,
> > > which then become new registers in the final NIR form. This leads to
NIR
> > > code like [2].  In contrast, the old vec4 visitor path, is able to
have
> > > writes to the same variable write to the same register.
> > >
> > > As a result, if I print the code right before register allocation in
the
> > > NIR path [3] and I compare that to what we get with the old vec4
visitor
> > > path at that same point [4], it is clearly visible that this
difference
> > > is allowing the vec4 visitor path to reduce register pressure (see how
> > > in [4] we have multiple writes to vgrf5, while in [3] we always write
to
> > > a new vgrf every time).
> > >
> > > So, am I missing something or is this kind of result expected with NIR
> > > programs? Is there anything in the nir->vec4 pass that we can do to
fix
> > > this or does this need to be fixed when going out of SSA moe inside
NIR?
> > >
> > > Iago
> > >
> > > [1] http://pastebin.com/5uA8ex2S
> > > [2] http://pastebin.com/pqLfvAVN
> > > [3] http://pastebin.com/64nSuUH8
> > > [4] http://pastebin.com/WCrdYxzt
> > >
> > > _______________________________________________
> > > mesa-dev mailing list
> > > mesa-dev at lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
> > Hi Iago,
> >
> > Indeed, NIR does convert conditional writes to conditional selectss --
> > it's a required part of the conversion to SSA, and since our HW has a
> > conditional select instruction that's just as fast as doing a
> > conditional move, we haven't bothered much to try and change it back
> > during out-of-SSA. However, doing this shouldn't make things worse. In
> > your example, vgrf9, vgrf15, and vgrf17 all have very short live
> > intervals and don't interfere with vgrf11 (unless there's another use
> > of them somewhere after the snippet you pasted), which means that the
> > register allocator is free to allocate the destinations of all the
> > selects to the same register.
> >
> > What's happening, though, is that you're running into our terrible
> > liveness analysis. After doing the proper liveness analysis, we figure
> > out the place each register first becomes live and last becomes dead,
> > and then we consider registers that have overlapping ranges to
> > interfere. So we consider vgrf11 to interfere with vgrf15 and vgrf17,
> > even though it really doesn't. The trouble with making it do the right
> > thing is that we may actually need to extend the live ranges of
> > registers when the exec masks don't match up, either because one uses
> > writemask_all or because they have incompatible exec masks due to
> > containing different datatypes (half-float vs. float, etc.). For
> > example, in your snippet, if vgrf15 were to be written with
> > force_writemask_all, then it actually would interfere with vgrf11. But
> > right now we're simply being conservative and assuming everything is
> > incompatible, which produces extra register pressure in situations
> > like the one you have. We have the same problem with FS, and I have
> > some idea how to make it better there, but I'm not sure if we would
> > care enough to port it to vec4.
> >
> > So the long answer is that there are some unfortunate features of the
> > backend that are making things much worse than they should be wrt
> > register allocation, and separating things out like SSA does seems to
> > make it worse in certain situations. The short answer is: go fix
> > register spilling!
> >
> > Connor
>
> I see, thanks for the detailed reply Connor. I'll have a look at the
> vec4 register spilling code and Ben's work in that area and see what I
> can do.

Thanks for working on that, it is the core problem here.  In the FS, we
have an INTEL_DEBUG=spill option that triggers tells it to spill
*everything*. That helps a lot when debugging spilling.
--Jason

> Iago
>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150716/e78ebcdb/attachment-0001.html>