<div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote">On Thu, Apr 14, 2016 at 1:50 AM, Iago Toral <<a href="mailto:itoral@igalia.com" target="_blank">itoral@igalia.com</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Wed, 2016-04-13 at 08:29 -0700, Jason Ekstrand wrote: > > On Apr 13, 2016 7:57 AM, "Connor Abbott" <<a href="mailto:cwabbott0@gmail.com">cwabbott0@gmail.com</a>> wrote: > > > > On Wed, Apr 13, 2016 at 3:24 AM, Iago Toral <<a href="mailto:itoral@igalia.com">itoral@igalia.com</a>> > wrote: > > > On Tue, 2016-04-12 at 13:16 -0400, Connor Abbott wrote: > > >> I'm not sure I'm so comfortable with this. For one, this is the > sort > > >> of thing the backend can (and should) do. Why should be be adding > > >> stuff at the NIR level to help the backend's copy propagation > pass? > > >> And perhaps more importantly, why should we be adding stuff at > the NIR > > >> level to workaround the lack of spilling support for doubles? I > know > > >> it's more difficult to implement than normal spilling, but it > just > > >> feels dirty to workaround it because some tests happen to use too > many > > >> registers. > > > > > > My thoughts for this were that NIR is producing suboptimal output > for > > > this case: it is unpacking and repacking an entire 64-bit float > when it > > > only really needs to operate on the high 32-bit, so fixing that in > NIR, > > > where the issue is introduced, instead of having each backend fix > it, > > > seemed to make more sense to me. > > > > > > I am not sure that backends can easily fix this either: what they > are > > > going to see is a nir_op_pack_double_2x32_split operation, they > don't > > > have any context to tell if the low 32-bit argument is actually > > > different from what is already in the register, other than > emitting code > > > to check for that which kind of defeats the purpose of the > optimization. > > > I suppose they could still see if the instruction that generated > the low > > > 32-bit value is actually an unpack of the same register and do the > right > > > thing in that case. Would you like that solution better? > > > > > > In any case, I don't think this is something for copy-propagation > to > > > handle this is really not about copy-propagating a value, it is > about > > > not needing that value to begin with. Register pressure comes from > > > needing to have the low 32-bit value alive until we recombine it > in the > > > 64-bit value. This patch makes it so that the low 32-bit is not > alive at > > > any moment, we don't need to care about it at any point, and that > is why > > > it helps reduce register pressure. > > I think we have two fundamental problems here. One is that spilling > is broken. The second is that NIR is generating code that the backend > can't handle without spilling. I can totally believe that the later > is happening given that, thanks to the flag, a the backend can't move > two SELs past each other when it schedules and we use SEL very heavily > in the double lowering. > > In this particular case, I do think we have a coalescing problem more > than a need for a new opcode. Thanks to SSA you could just as easily > see a pack where one source is an unpack and do what you're doing with > this "replace the top bits" opcode in fs/vec4_nir without adding a > special opcode for it. Therefore, even if this little optimization > helps, I think the opcode is unneeded. I also think it's probably > solvable as Connor says with better coalescing. </div></div>Yes, this is what I suggested as an alternative to do this in the backend in my original reply to Connor. There is a small caveat here though: we would have to assume that if the low 32-bit operand comes from an unpack opcode, it is an unpack of the same value. We can't verify this precisely because of SSA.</blockquote><div> </div><div>I don't follow. If you have </div><div>ssa_1 = unpack(ssa_0) </div><div>ssa_2 = stuff </div><div>ssa_3 = pack_split(ssa_2, ssa_1.y) </div><div>that should be exactly equivalent to </div><div>ssa_2 = stuff </div><div>ssa_3 = pack_split_y(ssa_0, ssa_2) </div><div>You can easily detect it and just do the slightly simpler thing. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> It is probably okay to assume this, since I don't imagine situations where we want to unpack a double and repack its low 32-bit into a different double value, but it is a bit of a hack :) </blockquote><div> </div><div>Sounds like we have hacks all round :-) </div><div>I don't know if you're doing this, but one thing that might help substantially with your spilling problems is if we run lower_alu_to_scalar before we lower double ops. </div><div>Is there a branch I can play with that has these spilling problems? If so, what tests do I need to look at? I'm planning to start playing with making spilling work better and would like to fix it for the fp64 case while I'm at it. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"> > > Think of what the backend is going to turn this into, though. Say > you > > have a sequence like: > > > > hi = unpack_double_2x32_split_y foo > > > > new_hi = ... > > > > lo = unpack_double_2x32_split_x foo > > bar = pack_double_2x32_split lo, new_hi > > > > then this gets transformed to, in the backend: > > > > hi = foo (stride 2, offset 1) > > > > new_hi = .. > > > > lo = foo (stride 2, offset 0) > > bar (stride 2, offset 0) = lo (stride 2, offset 0) # <- notice this > > bar (stride 2, offset 1) hi (stride 2, offset 1) > > > > Now the move from lo to bar has the same stride, and the same > offset, > > and the same offset, and bar obviously doesn't conflict with lo, so > > register coalescing can combine lo and bar. This will result in: > > > > hi = foo (stride 2, offset 1) > > > > new_hi = .. > > > > bar (stride 2, offset 0) = foo (stride 2, offset 0) > > bar (stride 2, offset 1) hi (stride 2, offset 1) > > > > This is the same code (with the same register pressure) as if you > had > > used pack_double_2x32_split_y. In fact, in the bugzilla bug you gave > > me an example of a shader where this happens, and I pointed it out > to > > you. I think all you need to do to fix your problem is move down the > > unpack_2x32_split_x to right before the pack, since the backend > isn't > > smart enough to reorder it. > > > > In any case, I think that we should get spilling to work on doubles > regardless. > > > > > > > > Jason also mentioned that what we should do is fix the spilling > code. I > > > agree that we should do that, but I think the underlying issue > here is > > > not that, but that NIR's output is leading to inefficient code, > which is > > > then leading to unnecessary spilling in some cases. If we fix the > > > spilling then we can live with the problem, but the problem is > still > > > there: we are still having live values that we don't need taking > some > > > register space and leading to unnecessary spilling. > > > > > > In any case, I understand why you don't like the patch, so I am > fine > > > with holding it back and focus on fixing spilling code first. > FWIW, I > > > already had a stab at that, but I run into something that I didn't > quite > > > figure out at that time: > > > > > > > <a href="https://lists.freedesktop.org/archives/mesa-dev/2016-March/109370.html" rel="noreferrer" target="_blank">https://lists.freedesktop.org/archives/mesa-dev/2016-March/109370.html</a> > > > > I didn't notice this fly by, but you're not doing the spilling > right. > > For things where the stride * type_size is 64, you can't decompose > it > > into two scratch reads/writes, since then the exec mask will be > wrong. > > You only want to read/write a channel if the exec mask is enabled > for > > it. You have to use a different message to spill doubles, since the > > normal one (the scattered read/write) assumes the exec mask channels > > correspond to 32-bit chunks. Jason would know more about this. > > > > > > > > I still have plans to get back to that though. > > > > > > Iago > > > > > >> Jason, what do you think? > > >> > > >> On Tue, Apr 12, 2016 at 4:05 AM, Samuel Iglesias Gonsálvez > > >> <<a href="mailto:siglesias@igalia.com">siglesias@igalia.com</a>> wrote: > > >> > From: Iago Toral Quiroga <<a href="mailto:itoral@igalia.com">itoral@igalia.com</a>> > > >> > > > >> > This is useful when we only need to modify the high 32-bit > chunk of a double. > > >> > This is a common case, because this is the part that encodes > the exponent > > >> > which we manipulate in some double lowering passes. Although we > can accomplish > > >> > the same by using pack_double_2x32, this new opcode is better > for register > > >> > pressure, since we don't have to unpack both parts of the > double and keep > > >> > the low 32-bits around until we can recombine the new exponent. > > >> > > > >> > We will use this opcode in the set_exponent() function of the > double lowering > > >> > pass and with that we will fix spilling issues in some dmat4 > divide > > >> > piglit tests on Intel. > > >> > --- > > >> > src/compiler/nir/nir_opcodes.py | 14 ++++++++++++++ > > >> > 1 file changed, 14 insertions(+) > > >> > > > >> > diff --git a/src/compiler/nir/nir_opcodes.py > b/src/compiler/nir/nir_opcodes.py > > >> > index 9f62e08..f92dd8d 100644 > > >> > --- a/src/compiler/nir/nir_opcodes.py > > >> > +++ b/src/compiler/nir/nir_opcodes.py > > >> > @@ -276,6 +276,20 @@ di.i2 = src0.y; > > >> > dst.x = di.u64; > > >> > """) > > >> > > > >> > +opcode("pack_double_2x32_split_y", 0, tuint64, [0, 0], > [tuint64, tuint32], "", """ > > >> > +union { > > >> > + uint64_t u64; > > >> > + struct { > > >> > + uint32_t i1; > > >> > + uint32_t i2; > > >> > + }; > > >> > +} di; > > >> > + > > >> > +di.u64 = src0; > > >> > +di.i1 = src1; > > >> > +dst = di.u64; > > >> > +""") > > >> > + > > >> > unop_horiz("unpack_double_2x32", 2, tuint32, 1, tuint64, """ > > >> > union { > > >> > uint64_t u64; > > >> > -- > > >> > 2.5.0 > > >> > > > >> > _______________________________________________ > > >> > mesa-dev mailing list > > >> > <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a> > > >> > <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a> > > >> _______________________________________________ > > >> mesa-dev mailing list > > >> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a> > > >> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a> > > > > > > > > </div></div></blockquote></div> </div></div>