<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sat, May 26, 2018 at 3:36 PM, Francisco Jerez <span dir="ltr"><<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5">Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> writes:<br>
<br>
> On Sat, May 26, 2018 at 2:03 PM, Francisco Jerez <<a href="mailto:currojerez@riseup.net">currojerez@riseup.net</a>><br>
> wrote:<br>
><br>
>> Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> writes:<br>
>><br>
>> > On Sat, May 26, 2018 at 12:21 PM, Francisco Jerez <<a href="mailto:currojerez@riseup.net">currojerez@riseup.net</a><br>
>> ><br>
>> > wrote:<br>
>> ><br>
>> >> Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> writes:<br>
>> >><br>
>> >> > On Sat, May 26, 2018 at 11:10 AM, Francisco Jerez <<br>
>> <a href="mailto:currojerez@riseup.net">currojerez@riseup.net</a><br>
>> >> ><br>
>> >> > wrote:<br>
>> >> ><br>
>> >> >> Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> writes:<br>
>> >> >><br>
>> >> >> > From: Francisco Jerez <<a href="mailto:currojerez@riseup.net">currojerez@riseup.net</a>><br>
>> >> >> ><br>
>> >> >> > When we don't have PLN (gen4 and gen11+), we implement LINTERP as<br>
>> >> either<br>
>> >> >> > LINE+MAC or a pair of MADs. In both cases, the accumulator is<br>
>> written<br>
>> >> >> > by the first of the two instructions and read by the second. Even<br>
>> >> >> > though the accumulator value isn't actually ever used from a<br>
>> logical<br>
>> >> >> > instruction perspective, it is trashed so we need to make the<br>
>> >> scheduler<br>
>> >> >> > aware. Otherwise, the scheduler could end up re-ordering<br>
>> instructions<br>
>> >> >> > and putting a LINTERP between another an instruction which writes<br>
>> the<br>
>> >> >> > accumulator and another which tries to use that result.<br>
>> >> >> ><br>
>> >> >> > Cc: <a href="mailto:mesa-stable@lists.freedesktop.org">mesa-stable@lists.freedesktop.<wbr>org</a><br>
>> >> >> > ---<br>
>> >> >> > src/intel/compiler/brw_shader.<wbr>cpp | 3 ++-<br>
>> >> >> > 1 file changed, 2 insertions(+), 1 deletion(-)<br>
>> >> >> ><br>
>> >> >> > diff --git a/src/intel/compiler/brw_<wbr>shader.cpp<br>
>> >> b/src/intel/compiler/brw_<br>
>> >> >> shader.cpp<br>
>> >> >> > index 141b64e..dfd2c5c 100644<br>
>> >> >> > --- a/src/intel/compiler/brw_<wbr>shader.cpp<br>
>> >> >> > +++ b/src/intel/compiler/brw_<wbr>shader.cpp<br>
>> >> >> > @@ -984,7 +984,8 @@ backend_instruction::writes_<br>
>> >> accumulator_implicitly(const<br>
>> >> >> struct gen_device_info<br>
>> >> >> > return writes_accumulator ||<br>
>> >> >> > (devinfo->gen < 6 &&<br>
>> >> >> > ((opcode >= BRW_OPCODE_ADD && opcode < BRW_OPCODE_NOP)<br>
>> ||<br>
>> >> >> > - (opcode >= FS_OPCODE_DDX_COARSE && opcode <=<br>
>> >> >> FS_OPCODE_LINTERP)));<br>
>> >> >> > + (opcode >= FS_OPCODE_DDX_COARSE && opcode <=<br>
>> >> >> FS_OPCODE_LINTERP))) ||<br>
>> >> >> > + (opcode == FS_OPCODE_LINTERP && !devinfo->has_pln);<br>
>> >> >><br>
>> >> >> The devinfo->has_pln condition is technically inaccurate, because<br>
>> even<br>
>> >> >> SNB will fall back to the non-PLN path which overwrites the<br>
>> accumulator<br>
>> >> >> for certain valid IR, which yeah I'm aware is not *typically*<br>
>> >> >> encountered before this series,<br>
>> >> ><br>
>> >> ><br>
>> >> > I'm pretty sure it's impossible before this series because, without<br>
>> >> SIMD32,<br>
>> >> > the barycentric coordinates always start at g2 and get incremented by<br>
>> 2<br>
>> >> > every time. The only other way to get something into the coordinates<br>
>> >> > source of the PLN is with a pixel interpolator message. For that, the<br>
>> >> > register allocator has a workaround to ensure that it's assigned an<br>
>> even<br>
>> >> > register on SNB. One of the early patches in my series replaces the<br>
>> >> broken<br>
>> >> > gen4.5-6 PLN fall-back (it didn't work in SIMD16 because it assumed<br>
>> the<br>
>> >> > wrong register layout for coordinates) with an assert and Jenkins is<br>
>> just<br>
>> >> > fine with the assert.<br>
>> >> ><br>
>> >><br>
>> >> I know the conditions for the non-PLN fall-back are not typically<br>
>> >> encountered on Gen5-6, but it's still valid IR, so this implementation<br>
>> >> of writes_accumulator_implicitly(<wbr>) relies on the behavior of the<br>
>> >> register allocator, the NIR-to-i965 translation pass and the rest of the<br>
>> >> visitor being exactly what you expect in every possible codepath for<br>
>> >> correctness.<br>
>> ><br>
>> ><br>
>> > It's already relied on for correctness because the LINE+MAC fallback that<br>
>> > we had before is wrong in SIMD16 (see patch 33).<br>
>> ><br>
>><br>
>> Yes, I had to do basically the same fix on my SIMD32 branch in order for<br>
>> it to pass piglit on Gen4-5. Still hardly an excuse to make more code<br>
>> rely on the same broken assumption which wasn't doing it before.<br>
>><br>
><br>
> I'm very confused by what you mean here. The old code assumed the LINE+MAC<br>
> layout register layout which is different from the PLN layout. This means<br>
> that the fallback is broken for g45-gen6 because the hardware supports PLN<br>
> so we use the PLN layout which arranges thing x1y1x2y2 but the LINE+MAC<br>
> fallback assumes x1x2y1y2. The reason why we need so much more code is<br>
> because we have to split it all the way down to 8-wide in order to handle<br>
> the PLN layout with LINE+MAC. Looking at the code in your branch, I have<br>
> no idea how it could possibly work correctly.<br>
><br>
<br>
</div></div>It worked correctly with the LINE/MAC path (and in fact it passed piglit<br>
with SIMD32 enabled on Gen4-5) in exactly the same way PATCH 33 of this<br>
series does, but it used a loop instead of the 3x hand-unrolled code of<br>
PATCH 33, and it took advantage of the two accumulator registers for<br>
better pipelining.</blockquote><div><br></div><div>I dug around a bit more in your your branch and I think I know what's going on now. In your patch "i965/fs: Fix LINTERP instruction codegen for misaligned regs on pre-Gen7 hw", you changed the LINE+MAC emit code to use a loop to generate 1, 2, or 4 8-wide instructions. This implicitly switches the assumed layout of barycentric coordinates in the fall-back code to the PLN convention (x1y1x2y2) and breaks SIMD16 on gen4 which lays the barycentric coordinates out as x1x2y1y2. In "i965/fs: Fix Gen4-5 interpolation setup for SIMD32", you silently change the barycentric coordinate layout convention on gen4 to be the PLN convention and it gets fixed again. I'm also very confused about accumulators since the SNB documentation says that we have 16 floats worth (two accumulators) and your branch definitely does 4 8-wide LINE instructions followed by 4 8-wide MAC instructions. I agree that taking advantage of the two accumulators would be good, but the patch I see in your branch seems to be incorrect.</div><div><br></div><div>All in all, changing the gen4 barycentric coordinate convention doesn't seem like all that bad of a thing. It's a bit odd that we lay them out differently on exactly one platform. I suppose, though, that we might be able to generate SIMD16 MADs on gen11 if we started using the LINE convention again. Not sure if it's worth it.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I can send you the patch as a replacement for PATCH<br>
33 if you care at all about performance of this corner case on Gen4-6...<br>
<span class="gmail-"><br>
> My point is that the odd register fallback on g4x-gen6 was never used and<br>
> didn't work even if it was.<br>
><br>
<br>
</span>My point was that it doesn't matter for correctness of this patch<br>
whether another codepath in the compiler is buggy under approximately<br>
the same conditions. If you think there is some sort of benefit from<br>
committing knowingly broken code just so you can fix it in the next<br>
commit, feel free, but please drop my name from the author tag since I<br>
didn't intend it to.<br>
<div class="gmail-HOEnZb"><div class="gmail-h5"><br>
> --Jason<br>
><br>
><br>
>> ><br>
>> >> That wasn't the case in my original patch, nor in your<br>
>> >> series after PATCHv2 33 because this inaccuracy actually becomes a<br>
>> >> problem. Instead of introducing code which we know is dubiously<br>
>> >> correct, CC'ing mesa-stable, and then fixing it immediately, why don't<br>
>> >> we just do the obviously correct thing from the start?<br>
>> >><br>
>> ><br>
>> > Sure, we can squash them together if you really want. But if you're<br>
>> > concerned about this restriction being valid in live code, then 33/53<br>
>> also<br>
>> > needs to go to stable. I'm fine with that if you're unconvinced by my<br>
>> > argument that LINTERP with an odd coordinate never occurs.<br>
>> ><br>
>> > --Jason<br>
>> ><br>
>> ><br>
>> >> ><br>
>> >> >> but why make things more inaccurate than<br>
>> >> >> the original only to revert back to a devinfo->gen based check in<br>
>> >> >> PATCHv2 33?<br>
>> >> >><br>
>> >> ><br>
>> >> > See above.<br>
>> >> ><br>
>> >> ><br>
>> >> >> I think I'd squash the last hunk of PATCHv2 33 into this one which<br>
>> would<br>
>> >> >> give you something functionally equivalent to v1 but updated to<br>
>> handle<br>
>> >> >> Gen11 correctly.<br>
>> >> >><br>
>> >> >> > }<br>
>> >> >> ><br>
>> >> >> > bool<br>
>> >> >> > --<br>
>> >> >> > 2.5.0.400.gff86faf<br>
>> >> >><br>
>> >><br>
>><br>
</div></div></blockquote></div><br></div></div>