[Mesa-dev] Register spilling issues in the NIR->vec4 backend

Ben Widawsky ben at bwidawsk.net
Mon Jul 20 10:29:25 PDT 2015


On Mon, Jul 20, 2015 at 03:35:26PM +0200, Iago Toral wrote:
> Hi,
> On Thu, 2015-07-16 at 08:15 -0700, Jason Ekstrand wrote:
> > 
> > On Jul 15, 2015 11:20 PM, "Iago Toral" <itoral at igalia.com> wrote:
> > >
> > > On Wed, 2015-07-15 at 11:02 -0700, Connor Abbott wrote:
> > > > On Wed, Jul 15, 2015 at 7:49 AM, Iago Toral <itoral at igalia.com>
> > wrote:
> > > > > Hi,
> > > > >
> > > > > when we sent the patches for the new nir->vec4 backend we
> > mentioned that
> > > > > we had a few dEQP tests that would fail to link because of
> > register
> > > > > spilling. Now that we have added GS support we see a few
> > instances of
> > > > > this problem popping up in a few GS piglit tests too, for
> > example this
> > > > > one:
> > > > >
> > > > >
> > tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd.shader_test
> > > > >
> > > > > I have been looking into what is going on with these tests and I
> > came to
> > > > > the conclusion that the problem is a consequence of various
> > factors, but
> > > > > probably the main thing contributing to it is the way our SSA
> > pass
> > > > > works. That said, I am not that experienced with NIR, so it
> > could also
> > > > > be that my analysis is missing something and I am just arriving
> > to wrong
> > > > > conclusions, so I'll explain my thoughts below and hopefully
> > someone
> > > > > else with more NIR experience can jump in and confirm or reject
> > my
> > > > > analysis.
> > > > >
> > > > > The GS code in that test looks like this:
> > > > >
> > > > > for (int p = 0; p < 3; p++) {
> > > > >    color = ((index >= ins[p].m1.length() ?
> > > > >             ins[p].m2[index-ins[p].m1.length()] :
> > > > >             ins[p].m1[index]) == expect) ?
> > > > >                vec4(0.0, 1.0, 0.0, 1.0) : vec4(1.0, 0.0, 0.0,
> > 1.0);
> > > > >    gl_Position = gl_in[p].gl_Position;
> > > > >    EmitVertex();
> > > > > }
> > > > >
> > > > > One thing that is immediately contributing to the register
> > pressure is
> > > > > some really awful code generated because of the indirect array
> > indexing
> > > > > on the inputs inside the loop. This is because of the
> > > > > lower_variable_index_to_cond_assign lowering pass called from
> > > > > brw_shader.cpp. This pass will convert that color assignment
> > into a
> > > > > bunch of nested if/else statements which makes the generated
> > GLSL IR
> > > > > code rather large, involving plenty of temporaries too. This is
> > only
> > > > > made worse by the fact that loop unrolling will replicate that 3
> > times.
> > > > > The result is a huge pile of GLSL IR with a few dozens of nested
> > if/else
> > > > > statements and temporaries that looks like [1] (that is only a
> > fragment
> > > > > of the GLSL IR).
> > > > >
> > > > > One thing that is particularly relevant in that code is that it
> > has
> > > > > multiple conditional assignments to the same variable
> > > > > (dereference_array_value) as a consequence of this lowering
> > pass.
> > > > >
> > > > > That much, however, is common to the NIR and non-NIR paths. The
> > problem
> > > > > in the NIR case is that all these assignments generate new SSA
> > values,
> > > > > which then become new registers in the final NIR form. This
> > leads to NIR
> > > > > code like [2].  In contrast, the old vec4 visitor path, is able
> > to have
> > > > > writes to the same variable write to the same register.
> > > > >
> > > > > As a result, if I print the code right before register
> > allocation in the
> > > > > NIR path [3] and I compare that to what we get with the old vec4
> > visitor
> > > > > path at that same point [4], it is clearly visible that this
> > difference
> > > > > is allowing the vec4 visitor path to reduce register pressure
> > (see how
> > > > > in [4] we have multiple writes to vgrf5, while in [3] we always
> > write to
> > > > > a new vgrf every time).
> > > > >
> > > > > So, am I missing something or is this kind of result expected
> > with NIR
> > > > > programs? Is there anything in the nir->vec4 pass that we can do
> > to fix
> > > > > this or does this need to be fixed when going out of SSA moe
> > inside NIR?
> > > > >
> > > > > Iago
> > > > >
> > > > > [1] http://pastebin.com/5uA8ex2S
> > > > > [2] http://pastebin.com/pqLfvAVN
> > > > > [3] http://pastebin.com/64nSuUH8
> > > > > [4] http://pastebin.com/WCrdYxzt
> > > > >
> > > > > _______________________________________________
> > > > > mesa-dev mailing list
> > > > > mesa-dev at lists.freedesktop.org
> > > > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > >
> > > > Hi Iago,
> > > >
> > > > Indeed, NIR does convert conditional writes to conditional
> > selectss --
> > > > it's a required part of the conversion to SSA, and since our HW
> > has a
> > > > conditional select instruction that's just as fast as doing a
> > > > conditional move, we haven't bothered much to try and change it
> > back
> > > > during out-of-SSA. However, doing this shouldn't make things
> > worse. In
> > > > your example, vgrf9, vgrf15, and vgrf17 all have very short live
> > > > intervals and don't interfere with vgrf11 (unless there's another
> > use
> > > > of them somewhere after the snippet you pasted), which means that
> > the
> > > > register allocator is free to allocate the destinations of all the
> > > > selects to the same register.
> > > >
> > > > What's happening, though, is that you're running into our terrible
> > > > liveness analysis. After doing the proper liveness analysis, we
> > figure
> > > > out the place each register first becomes live and last becomes
> > dead,
> > > > and then we consider registers that have overlapping ranges to
> > > > interfere. So we consider vgrf11 to interfere with vgrf15 and
> > vgrf17,
> > > > even though it really doesn't. The trouble with making it do the
> > right
> > > > thing is that we may actually need to extend the live ranges of
> > > > registers when the exec masks don't match up, either because one
> > uses
> > > > writemask_all or because they have incompatible exec masks due to
> > > > containing different datatypes (half-float vs. float, etc.). For
> > > > example, in your snippet, if vgrf15 were to be written with
> > > > force_writemask_all, then it actually would interfere with vgrf11.
> > But
> > > > right now we're simply being conservative and assuming everything
> > is
> > > > incompatible, which produces extra register pressure in situations
> > > > like the one you have. We have the same problem with FS, and I
> > have
> > > > some idea how to make it better there, but I'm not sure if we
> > would
> > > > care enough to port it to vec4.
> > > >
> > > > So the long answer is that there are some unfortunate features of
> > the
> > > > backend that are making things much worse than they should be wrt
> > > > register allocation, and separating things out like SSA does seems
> > to
> > > > make it worse in certain situations. The short answer is: go fix
> > > > register spilling!
> > > >
> > > > Connor
> > >
> > > I see, thanks for the detailed reply Connor. I'll have a look at the
> > > vec4 register spilling code and Ben's work in that area and see what
> > I
> > > can do.
> > 
> > Thanks for working on that, it is the core problem here.  In the FS,
> > we have an INTEL_DEBUG=spill option that triggers tells it to spill
> > *everything*. That helps a lot when debugging spilling.
> > --Jason
> 
> FYI, I have just fixed this. The problem was not related to register
> spilling though but to incorrect live analysis for conditional writes
> with the select opcode. I have just sent this patch for review:
> http://lists.freedesktop.org/archives/mesa-dev/2015-July/089219.html
> 
> That fixes all the piglit and dEQP tests that failed to link due to
> excessive register spilling with the NIR-vec4 backend.
> 
> I can still have a go at fixing issues with register spilling though.
> Looking at previous work from Ben it looks like he was trying to make it
> work with register sizes >1. Is that what you all had in mind or is
> there something else?
> 
> Iago
> 

The spilling code in general is pretty bad, and I tried to throw in a couple of
quick hacks (like using only 1 VRF for the spill) to alleviate the problem. I
did put some comments here:
http://cgit.freedesktop.org/~bwidawsk/mesa/tree/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp?h=vec4-more-spill3#n355

Anyway, I am far from the expert.


More information about the mesa-dev mailing list