[Mesa-dev] [PATCH 12/13] nir: add pack_double_2x32_split_y opcode

Thu Apr 14 20:31:43 UTC 2016

On Thu, Apr 14, 2016 at 1:50 AM, Iago Toral <itoral at igalia.com> wrote:

> On Wed, 2016-04-13 at 08:29 -0700, Jason Ekstrand wrote:
> >
> > On Apr 13, 2016 7:57 AM, "Connor Abbott" <cwabbott0 at gmail.com> wrote:
> > >
> > > On Wed, Apr 13, 2016 at 3:24 AM, Iago Toral <itoral at igalia.com>
> > wrote:
> > > > On Tue, 2016-04-12 at 13:16 -0400, Connor Abbott wrote:
> > > >> I'm not sure I'm so comfortable with this. For one, this is the
> > sort
> > > >> of thing the backend can (and should) do. Why should be be adding
> > > >> stuff at the NIR level to help the backend's copy propagation
> > pass?
> > > >> And perhaps more importantly, why should we be adding stuff at
> > the NIR
> > > >> level to workaround the lack of spilling support for doubles? I
> > know
> > > >> it's more difficult to implement than normal spilling, but it
> > just
> > > >> feels dirty to workaround it because some tests happen to use too
> > many
> > > >> registers.
> > > >
> > > > My thoughts for this were that NIR is producing suboptimal output
> > for
> > > > this case: it is unpacking and repacking an entire 64-bit float
> > when it
> > > > only really needs to operate on the high 32-bit, so fixing that in
> > NIR,
> > > > where the issue is introduced, instead of having each backend fix
> > it,
> > > > seemed to make more sense to me.
> > > >
> > > > I am not sure that backends can easily fix this either: what they
> > are
> > > > going to see is a nir_op_pack_double_2x32_split operation, they
> > don't
> > > > have any context to tell if the low 32-bit argument is actually
> > > > different from what is already in the register, other than
> > emitting code
> > > > to check for that which kind of defeats the purpose of the
> > optimization.
> > > > I suppose they could still see if the instruction that generated
> > the low
> > > > 32-bit value is actually an unpack of the same register and do the
> > right
> > > > thing in that case. Would you like that solution better?
> > > >
> > > > In any case, I don't think this is something for copy-propagation
> > to
> > > > handle this is really not about copy-propagating a value, it is
> > about
> > > > not needing that value to begin with. Register pressure comes from
> > > > needing to have the low 32-bit value alive until we recombine it
> > in the
> > > > 64-bit value. This patch makes it so that the low 32-bit is not
> > alive at
> > > > any moment, we don't need to care about it at any point, and that
> > is why
> > > > it helps reduce register pressure.
> >
> > I think we have two fundamental problems here.  One is that spilling
> > is broken.  The second is that NIR is generating code that the backend
> > can't handle without spilling.  I can totally believe that the later
> > is happening given that, thanks to the flag, a the backend can't move
> > two SELs past each other when it schedules and we use SEL very heavily
> > in the double lowering.
> >
> > In this particular case, I do think we have a coalescing problem more
> > than a need for a new opcode.  Thanks to SSA you could just as easily
> > see a pack where one source is an unpack and do what you're doing with
> > this "replace the top bits" opcode in fs/vec4_nir without adding a
> > special opcode for it.  Therefore, even if this little optimization
> > helps, I think the opcode is unneeded.  I also think it's probably
> > solvable as Connor says with better coalescing.
>
> Yes, this is what I suggested as an alternative to do this in the
> backend in my original reply to Connor. There is a small caveat here
> though: we would have to assume that if the low 32-bit operand comes
> from an unpack opcode, it is an unpack of the same value. We can't
> verify this precisely because of SSA.

I don't follow.  If you have

ssa_1 = unpack(ssa_0)
ssa_2 = stuff
ssa_3 = pack_split(ssa_2, ssa_1.y)

that should be exactly equivalent to

ssa_2 = stuff
ssa_3 = pack_split_y(ssa_0, ssa_2)

You can easily detect it and just do the slightly simpler thing.

> It is probably okay to assume
> this, since I don't imagine situations where we want to unpack a double
> and repack its low 32-bit into a different double value, but it is a bit
> of a hack :)
>

Sounds like we have hacks all round :-)

I don't know if you're doing this, but one thing that might help
substantially with your spilling problems is if we run lower_alu_to_scalar
before we lower double ops.

Is there a branch I can play with that has these spilling problems?  If so,
what tests do I need to look at?  I'm planning to start playing with making
spilling work better and would like to fix it for the fp64 case while I'm
at it.

> > > Think of what the backend is going to turn this into, though. Say
> > you
> > > have a sequence like:
> > >
> > > hi = unpack_double_2x32_split_y foo
> > >
> > > new_hi = ...
> > >
> > > lo = unpack_double_2x32_split_x foo
> > > bar = pack_double_2x32_split lo, new_hi
> > >
> > > then this gets transformed to, in the backend:
> > >
> > > hi = foo (stride 2, offset 1)
> > >
> > > new_hi  = ..
> > >
> > > lo = foo (stride 2, offset 0)
> > > bar (stride 2, offset 0) = lo (stride 2, offset 0) # <- notice this
> > > bar (stride 2, offset 1) hi (stride 2, offset 1)
> > >
> > > Now the move from lo to bar has the same stride, and the same
> > offset,
> > > and the same offset, and bar obviously doesn't conflict with lo, so
> > > register coalescing can combine lo and bar. This will result in:
> > >
> > > hi = foo (stride 2, offset 1)
> > >
> > > new_hi  = ..
> > >
> > > bar (stride 2, offset 0) = foo (stride 2, offset 0)
> > > bar (stride 2, offset 1) hi (stride 2, offset 1)
> > >
> > > This is the same code (with the same register pressure) as if you
> > had
> > > used pack_double_2x32_split_y. In fact, in the bugzilla bug you gave
> > > me an example of a shader where this happens, and I pointed it out
> > to
> > > you. I think all you need to do to fix your problem is move down the
> > > unpack_2x32_split_x to right before the pack, since the backend
> > isn't
> > > smart enough to reorder it.
> > >
> > > In any case, I think that we should get spilling to work on doubles
> > regardless.
> > >
> > > >
> > > > Jason also mentioned that what we should do is fix the spilling
> > code. I
> > > > agree that we should do that, but I think the underlying issue
> > here is
> > > > not that, but that NIR's output is leading to inefficient code,
> > which is
> > > > then leading to unnecessary spilling in some cases. If we fix the
> > > > spilling then we can live with the problem, but the problem is
> > still
> > > > there: we are still having live values that we don't need taking
> > some
> > > > register space and leading to unnecessary spilling.
> > > >
> > > > In any case, I understand why you don't like the patch, so I am
> > fine
> > > > with holding it back and focus on fixing spilling code first.
> > FWIW, I
> > > > already had a stab at that, but I run into something that I didn't
> > quite
> > > > figure out at that time:
> > > >
> > > >
> > https://lists.freedesktop.org/archives/mesa-dev/2016-March/109370.html
> > >
> > > I didn't notice this fly by, but you're not doing the spilling
> > right.
> > > For things where the stride * type_size is 64, you can't decompose
> > it
> > > into two scratch reads/writes, since then the exec mask will be
> > wrong.
> > > You only want to read/write a channel if the exec mask is enabled
> > for
> > > it. You have to use a different message to spill doubles, since the
> > > normal one (the scattered read/write) assumes the exec mask channels
> > > correspond to 32-bit chunks. Jason would know more about this.
> > >
> > > >
> > > > I still have plans to get back to that though.
> > > >
> > > > Iago
> > > >
> > > >> Jason, what do you think?
> > > >>
> > > >> On Tue, Apr 12, 2016 at 4:05 AM, Samuel Iglesias Gonsálvez
> > > >> <siglesias at igalia.com> wrote:
> > > >> > From: Iago Toral Quiroga <itoral at igalia.com>
> > > >> >
> > > >> > This is useful when we only need to modify the high 32-bit
> > chunk of a double.
> > > >> > This is a common case, because this is the part that encodes
> > the exponent
> > > >> > which we manipulate in some double lowering passes. Although we
> > can accomplish
> > > >> > the same by using pack_double_2x32, this new opcode is better
> > for register
> > > >> > pressure, since we don't have to unpack both parts of the
> > double and keep
> > > >> > the low 32-bits around until we can recombine the new exponent.
> > > >> >
> > > >> > We will use this opcode in the set_exponent() function of the
> > double lowering
> > > >> > pass and with that we will fix spilling issues in some dmat4
> > divide
> > > >> > piglit tests on Intel.
> > > >> > ---
> > > >> >  src/compiler/nir/nir_opcodes.py | 14 ++++++++++++++
> > > >> >  1 file changed, 14 insertions(+)
> > > >> >
> > > >> > diff --git a/src/compiler/nir/nir_opcodes.py
> > b/src/compiler/nir/nir_opcodes.py
> > > >> > index 9f62e08..f92dd8d 100644
> > > >> > --- a/src/compiler/nir/nir_opcodes.py
> > > >> > +++ b/src/compiler/nir/nir_opcodes.py
> > > >> > @@ -276,6 +276,20 @@ di.i2 = src0.y;
> > > >> >  dst.x = di.u64;
> > > >> >  """)
> > > >> >
> > > >> > +opcode("pack_double_2x32_split_y", 0, tuint64, [0, 0],
> > [tuint64, tuint32], "", """
> > > >> > +union {
> > > >> > +    uint64_t u64;
> > > >> > +    struct {
> > > >> > +        uint32_t i1;
> > > >> > +        uint32_t i2;
> > > >> > +    };
> > > >> > +} di;
> > > >> > +
> > > >> > +di.u64 = src0;
> > > >> > +di.i1 = src1;
> > > >> > +dst = di.u64;
> > > >> > +""")
> > > >> > +
> > > >> >  unop_horiz("unpack_double_2x32", 2, tuint32, 1, tuint64, """
> > > >> >  union {
> > > >> >      uint64_t u64;
> > > >> > --
> > > >> > 2.5.0
> > > >> >
> > > >> > _______________________________________________
> > > >> > mesa-dev mailing list
> > > >> > mesa-dev at lists.freedesktop.org
> > > >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > >> _______________________________________________
> > > >> mesa-dev mailing list
> > > >> mesa-dev at lists.freedesktop.org
> > > >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > >
> > > >
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160414/4278a3a9/attachment.html>