<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sat, May 26, 2018 at 2:03 PM, Francisco Jerez <span dir="ltr"><<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="m_4448260923226074820HOEnZb"><div class="m_4448260923226074820h5">Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>> writes:<br> <br> > On Sat, May 26, 2018 at 12:21 PM, Francisco Jerez <<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>><br> > wrote:<br> ><br> >> Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>> writes:<br> >><br> >> > On Sat, May 26, 2018 at 11:10 AM, Francisco Jerez <<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a><br> >> ><br> >> > wrote:<br> >> ><br> >> >> Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>> writes:<br> >> >><br> >> >> > From: Francisco Jerez <<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>><br> >> >> ><br> >> >> > When we don't have PLN (gen4 and gen11+), we implement LINTERP as<br> >> either<br> >> >> > LINE+MAC or a pair of MADs. In both cases, the accumulator is written<br> >> >> > by the first of the two instructions and read by the second. Even<br> >> >> > though the accumulator value isn't actually ever used from a logical<br> >> >> > instruction perspective, it is trashed so we need to make the<br> >> scheduler<br> >> >> > aware. Otherwise, the scheduler could end up re-ordering instructions<br> >> >> > and putting a LINTERP between another an instruction which writes the<br> >> >> > accumulator and another which tries to use that result.<br> >> >> ><br> >> >> > Cc: <a href="mailto:mesa-stable@lists.freedesktop.org" target="_blank">mesa-stable@lists.freedesktop.<wbr>org</a><br> >> >> > ---<br> >> >> > src/intel/compiler/brw_shader.<wbr>cpp | 3 ++-<br> >> >> > 1 file changed, 2 insertions(+), 1 deletion(-)<br> >> >> ><br> >> >> > diff --git a/src/intel/compiler/brw_shade<wbr>r.cpp<br> >> b/src/intel/compiler/brw_<br> >> >> shader.cpp<br> >> >> > index 141b64e..dfd2c5c 100644<br> >> >> > --- a/src/intel/compiler/brw_shade<wbr>r.cpp<br> >> >> > +++ b/src/intel/compiler/brw_shade<wbr>r.cpp<br> >> >> > @@ -984,7 +984,8 @@ backend_instruction::writes_<br> >> accumulator_implicitly(const<br> >> >> struct gen_device_info<br> >> >> > return writes_accumulator ||<br> >> >> > (devinfo->gen < 6 &&<br> >> >> > ((opcode >= BRW_OPCODE_ADD && opcode < BRW_OPCODE_NOP) ||<br> >> >> > - (opcode >= FS_OPCODE_DDX_COARSE && opcode <=<br> >> >> FS_OPCODE_LINTERP)));<br> >> >> > + (opcode >= FS_OPCODE_DDX_COARSE && opcode <=<br> >> >> FS_OPCODE_LINTERP))) ||<br> >> >> > + (opcode == FS_OPCODE_LINTERP && !devinfo->has_pln);<br> >> >><br> >> >> The devinfo->has_pln condition is technically inaccurate, because even<br> >> >> SNB will fall back to the non-PLN path which overwrites the accumulator<br> >> >> for certain valid IR, which yeah I'm aware is not *typically*<br> >> >> encountered before this series,<br> >> ><br> >> ><br> >> > I'm pretty sure it's impossible before this series because, without<br> >> SIMD32,<br> >> > the barycentric coordinates always start at g2 and get incremented by 2<br> >> > every time. The only other way to get something into the coordinates<br> >> > source of the PLN is with a pixel interpolator message. For that, the<br> >> > register allocator has a workaround to ensure that it's assigned an even<br> >> > register on SNB. One of the early patches in my series replaces the<br> >> broken<br> >> > gen4.5-6 PLN fall-back (it didn't work in SIMD16 because it assumed the<br> >> > wrong register layout for coordinates) with an assert and Jenkins is just<br> >> > fine with the assert.<br> >> ><br> >><br> >> I know the conditions for the non-PLN fall-back are not typically<br> >> encountered on Gen5-6, but it's still valid IR, so this implementation<br> >> of writes_accumulator_implicitly(<wbr>) relies on the behavior of the<br> >> register allocator, the NIR-to-i965 translation pass and the rest of the<br> >> visitor being exactly what you expect in every possible codepath for<br> >> correctness.<br> ><br> ><br> > It's already relied on for correctness because the LINE+MAC fallback that<br> > we had before is wrong in SIMD16 (see patch 33).<br> ><br> <br> </div></div>Yes, I had to do basically the same fix on my SIMD32 branch in order for<br> it to pass piglit on Gen4-5. Still hardly an excuse to make more code<br> rely on the same broken assumption which wasn't doing it before.<br><div class="m_4448260923226074820HOEnZb"><div class="m_4448260923226074820h5"></div></div></blockquote><div><br></div><div>I'm very confused by what you mean here. The old code assumed the LINE+MAC layout register layout which is different from the PLN layout. This means that the fallback is broken for g45-gen6 because the hardware supports PLN so we use the PLN layout which arranges thing x1y1x2y2 but the LINE+MAC fallback assumes x1x2y1y2. The reason why we need so much more code is because we have to split it all the way down to 8-wide in order to handle the PLN layout with LINE+MAC. Looking at the code in your branch, I have no idea how it could possibly work correctly.</div><div><br></div><div>My point is that the odd register fallback on g4x-gen6 was never used and didn't work even if it was.</div><div><br></div><div>--Jason<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="m_4448260923226074820HOEnZb"><div class="m_4448260923226074820h5"> ><br> >> That wasn't the case in my original patch, nor in your<br> >> series after PATCHv2 33 because this inaccuracy actually becomes a<br> >> problem. Instead of introducing code which we know is dubiously<br> >> correct, CC'ing mesa-stable, and then fixing it immediately, why don't<br> >> we just do the obviously correct thing from the start?<br> >><br> ><br> > Sure, we can squash them together if you really want. But if you're<br> > concerned about this restriction being valid in live code, then 33/53 also<br> > needs to go to stable. I'm fine with that if you're unconvinced by my<br> > argument that LINTERP with an odd coordinate never occurs.<br> ><br> > --Jason<br> ><br> ><br> >> ><br> >> >> but why make things more inaccurate than<br> >> >> the original only to revert back to a devinfo->gen based check in<br> >> >> PATCHv2 33?<br> >> >><br> >> ><br> >> > See above.<br> >> ><br> >> ><br> >> >> I think I'd squash the last hunk of PATCHv2 33 into this one which would<br> >> >> give you something functionally equivalent to v1 but updated to handle<br> >> >> Gen11 correctly.<br> >> >><br> >> >> > }<br> >> >> ><br> >> >> > bool<br> >> >> > --<br> >> >> > 2.5.0.400.gff86faf<br> >> >><br> >><br> </div></div></blockquote></div><br></div></div>