[Mesa-dev] [PATCH 12/13] nir: add pack_double_2x32_split_y opcode

Thu Apr 14 08:50:02 UTC 2016

On Wed, 2016-04-13 at 08:29 -0700, Jason Ekstrand wrote:
> 
> On Apr 13, 2016 7:57 AM, "Connor Abbott" <cwabbott0 at gmail.com> wrote:
> >
> > On Wed, Apr 13, 2016 at 3:24 AM, Iago Toral <itoral at igalia.com>
> wrote:
> > > On Tue, 2016-04-12 at 13:16 -0400, Connor Abbott wrote:
> > >> I'm not sure I'm so comfortable with this. For one, this is the
> sort
> > >> of thing the backend can (and should) do. Why should be be adding
> > >> stuff at the NIR level to help the backend's copy propagation
> pass?
> > >> And perhaps more importantly, why should we be adding stuff at
> the NIR
> > >> level to workaround the lack of spilling support for doubles? I
> know
> > >> it's more difficult to implement than normal spilling, but it
> just
> > >> feels dirty to workaround it because some tests happen to use too
> many
> > >> registers.
> > >
> > > My thoughts for this were that NIR is producing suboptimal output
> for
> > > this case: it is unpacking and repacking an entire 64-bit float
> when it
> > > only really needs to operate on the high 32-bit, so fixing that in
> NIR,
> > > where the issue is introduced, instead of having each backend fix
> it,
> > > seemed to make more sense to me.
> > >
> > > I am not sure that backends can easily fix this either: what they
> are
> > > going to see is a nir_op_pack_double_2x32_split operation, they
> don't
> > > have any context to tell if the low 32-bit argument is actually
> > > different from what is already in the register, other than
> emitting code
> > > to check for that which kind of defeats the purpose of the
> optimization.
> > > I suppose they could still see if the instruction that generated
> the low
> > > 32-bit value is actually an unpack of the same register and do the
> right
> > > thing in that case. Would you like that solution better?
> > >
> > > In any case, I don't think this is something for copy-propagation
> to
> > > handle this is really not about copy-propagating a value, it is
> about
> > > not needing that value to begin with. Register pressure comes from
> > > needing to have the low 32-bit value alive until we recombine it
> in the
> > > 64-bit value. This patch makes it so that the low 32-bit is not
> alive at
> > > any moment, we don't need to care about it at any point, and that
> is why
> > > it helps reduce register pressure.
> 
> I think we have two fundamental problems here.  One is that spilling
> is broken.  The second is that NIR is generating code that the backend
> can't handle without spilling.  I can totally believe that the later
> is happening given that, thanks to the flag, a the backend can't move
> two SELs past each other when it schedules and we use SEL very heavily
> in the double lowering.
> 
> In this particular case, I do think we have a coalescing problem more
> than a need for a new opcode.  Thanks to SSA you could just as easily
> see a pack where one source is an unpack and do what you're doing with
> this "replace the top bits" opcode in fs/vec4_nir without adding a
> special opcode for it.  Therefore, even if this little optimization
> helps, I think the opcode is unneeded.  I also think it's probably
> solvable as Connor says with better coalescing.

Yes, this is what I suggested as an alternative to do this in the
backend in my original reply to Connor. There is a small caveat here
though: we would have to assume that if the low 32-bit operand comes
from an unpack opcode, it is an unpack of the same value. We can't
verify this precisely because of SSA. It is probably okay to assume
this, since I don't imagine situations where we want to unpack a double
and repack its low 32-bit into a different double value, but it is a bit
of a hack :)

> > Think of what the backend is going to turn this into, though. Say
> you
> > have a sequence like:
> >
> > hi = unpack_double_2x32_split_y foo
> >
> > new_hi = ...
> >
> > lo = unpack_double_2x32_split_x foo
> > bar = pack_double_2x32_split lo, new_hi
> >
> > then this gets transformed to, in the backend:
> >
> > hi = foo (stride 2, offset 1)
> >
> > new_hi  = ..
> >
> > lo = foo (stride 2, offset 0)
> > bar (stride 2, offset 0) = lo (stride 2, offset 0) # <- notice this
> > bar (stride 2, offset 1) hi (stride 2, offset 1)
> >
> > Now the move from lo to bar has the same stride, and the same
> offset,
> > and the same offset, and bar obviously doesn't conflict with lo, so
> > register coalescing can combine lo and bar. This will result in:
> >
> > hi = foo (stride 2, offset 1)
> >
> > new_hi  = ..
> >
> > bar (stride 2, offset 0) = foo (stride 2, offset 0)
> > bar (stride 2, offset 1) hi (stride 2, offset 1)
> >
> > This is the same code (with the same register pressure) as if you
> had
> > used pack_double_2x32_split_y. In fact, in the bugzilla bug you gave
> > me an example of a shader where this happens, and I pointed it out
> to
> > you. I think all you need to do to fix your problem is move down the
> > unpack_2x32_split_x to right before the pack, since the backend
> isn't
> > smart enough to reorder it.
> >
> > In any case, I think that we should get spilling to work on doubles
> regardless.
> >
> > >
> > > Jason also mentioned that what we should do is fix the spilling
> code. I
> > > agree that we should do that, but I think the underlying issue
> here is
> > > not that, but that NIR's output is leading to inefficient code,
> which is
> > > then leading to unnecessary spilling in some cases. If we fix the
> > > spilling then we can live with the problem, but the problem is
> still
> > > there: we are still having live values that we don't need taking
> some
> > > register space and leading to unnecessary spilling.
> > >
> > > In any case, I understand why you don't like the patch, so I am
> fine
> > > with holding it back and focus on fixing spilling code first.
> FWIW, I
> > > already had a stab at that, but I run into something that I didn't
> quite
> > > figure out at that time:
> > >
> > >
> https://lists.freedesktop.org/archives/mesa-dev/2016-March/109370.html
> >
> > I didn't notice this fly by, but you're not doing the spilling
> right.
> > For things where the stride * type_size is 64, you can't decompose
> it
> > into two scratch reads/writes, since then the exec mask will be
> wrong.
> > You only want to read/write a channel if the exec mask is enabled
> for
> > it. You have to use a different message to spill doubles, since the
> > normal one (the scattered read/write) assumes the exec mask channels
> > correspond to 32-bit chunks. Jason would know more about this.
> >
> > >
> > > I still have plans to get back to that though.
> > >
> > > Iago
> > >
> > >> Jason, what do you think?
> > >>
> > >> On Tue, Apr 12, 2016 at 4:05 AM, Samuel Iglesias Gonsálvez
> > >> <siglesias at igalia.com> wrote:
> > >> > From: Iago Toral Quiroga <itoral at igalia.com>
> > >> >
> > >> > This is useful when we only need to modify the high 32-bit
> chunk of a double.
> > >> > This is a common case, because this is the part that encodes
> the exponent
> > >> > which we manipulate in some double lowering passes. Although we
> can accomplish
> > >> > the same by using pack_double_2x32, this new opcode is better
> for register
> > >> > pressure, since we don't have to unpack both parts of the
> double and keep
> > >> > the low 32-bits around until we can recombine the new exponent.
> > >> >
> > >> > We will use this opcode in the set_exponent() function of the
> double lowering
> > >> > pass and with that we will fix spilling issues in some dmat4
> divide
> > >> > piglit tests on Intel.
> > >> > ---
> > >> >  src/compiler/nir/nir_opcodes.py | 14 ++++++++++++++
> > >> >  1 file changed, 14 insertions(+)
> > >> >
> > >> > diff --git a/src/compiler/nir/nir_opcodes.py
> b/src/compiler/nir/nir_opcodes.py
> > >> > index 9f62e08..f92dd8d 100644
> > >> > --- a/src/compiler/nir/nir_opcodes.py
> > >> > +++ b/src/compiler/nir/nir_opcodes.py
> > >> > @@ -276,6 +276,20 @@ di.i2 = src0.y;
> > >> >  dst.x = di.u64;
> > >> >  """)
> > >> >
> > >> > +opcode("pack_double_2x32_split_y", 0, tuint64, [0, 0],
> [tuint64, tuint32], "", """
> > >> > +union {
> > >> > +    uint64_t u64;
> > >> > +    struct {
> > >> > +        uint32_t i1;
> > >> > +        uint32_t i2;
> > >> > +    };
> > >> > +} di;
> > >> > +
> > >> > +di.u64 = src0;
> > >> > +di.i1 = src1;
> > >> > +dst = di.u64;
> > >> > +""")
> > >> > +
> > >> >  unop_horiz("unpack_double_2x32", 2, tuint32, 1, tuint64, """
> > >> >  union {
> > >> >      uint64_t u64;
> > >> > --
> > >> > 2.5.0
> > >> >
> > >> > _______________________________________________
> > >> > mesa-dev mailing list
> > >> > mesa-dev at lists.freedesktop.org
> > >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > >> _______________________________________________
> > >> mesa-dev mailing list
> > >> mesa-dev at lists.freedesktop.org
> > >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > >
> > >
> 
>