[Mesa-stable] [PATCH v2 32/53] intel/fs: Mark LINTERP opcode as writing accumulator on platforms without PLN

Sat May 26 22:19:48 UTC 2018

On Sat, May 26, 2018 at 2:03 PM, Francisco Jerez <currojerez at riseup.net>
wrote:

> Jason Ekstrand <jason at jlekstrand.net> writes:
>
> > On Sat, May 26, 2018 at 12:21 PM, Francisco Jerez <currojerez at riseup.net
> >
> > wrote:
> >
> >> Jason Ekstrand <jason at jlekstrand.net> writes:
> >>
> >> > On Sat, May 26, 2018 at 11:10 AM, Francisco Jerez <
> currojerez at riseup.net
> >> >
> >> > wrote:
> >> >
> >> >> Jason Ekstrand <jason at jlekstrand.net> writes:
> >> >>
> >> >> > From: Francisco Jerez <currojerez at riseup.net>
> >> >> >
> >> >> > When we don't have PLN (gen4 and gen11+), we implement LINTERP as
> >> either
> >> >> > LINE+MAC or a pair of MADs.  In both cases, the accumulator is
> written
> >> >> > by the first of the two instructions and read by the second.  Even
> >> >> > though the accumulator value isn't actually ever used from a
> logical
> >> >> > instruction perspective, it is trashed so we need to make the
> >> scheduler
> >> >> > aware.  Otherwise, the scheduler could end up re-ordering
> instructions
> >> >> > and putting a LINTERP between another an instruction which writes
> the
> >> >> > accumulator and another which tries to use that result.
> >> >> >
> >> >> > Cc: mesa-stable at lists.freedesktop.org
> >> >> > ---
> >> >> >  src/intel/compiler/brw_shader.cpp | 3 ++-
> >> >> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >> >> >
> >> >> > diff --git a/src/intel/compiler/brw_shader.cpp
> >> b/src/intel/compiler/brw_
> >> >> shader.cpp
> >> >> > index 141b64e..dfd2c5c 100644
> >> >> > --- a/src/intel/compiler/brw_shader.cpp
> >> >> > +++ b/src/intel/compiler/brw_shader.cpp
> >> >> > @@ -984,7 +984,8 @@ backend_instruction::writes_
> >> accumulator_implicitly(const
> >> >> struct gen_device_info
> >> >> >     return writes_accumulator ||
> >> >> >            (devinfo->gen < 6 &&
> >> >> >             ((opcode >= BRW_OPCODE_ADD && opcode < BRW_OPCODE_NOP)
> ||
> >> >> > -            (opcode >= FS_OPCODE_DDX_COARSE && opcode <=
> >> >> FS_OPCODE_LINTERP)));
> >> >> > +            (opcode >= FS_OPCODE_DDX_COARSE && opcode <=
> >> >> FS_OPCODE_LINTERP))) ||
> >> >> > +          (opcode == FS_OPCODE_LINTERP && !devinfo->has_pln);
> >> >>
> >> >> The devinfo->has_pln condition is technically inaccurate, because
> even
> >> >> SNB will fall back to the non-PLN path which overwrites the
> accumulator
> >> >> for certain valid IR, which yeah I'm aware is not *typically*
> >> >> encountered before this series,
> >> >
> >> >
> >> > I'm pretty sure it's impossible before this series because, without
> >> SIMD32,
> >> > the barycentric coordinates always start at g2 and get incremented by
> 2
> >> > every time.  The only other way to get something into the coordinates
> >> > source of the PLN is with a pixel interpolator message.  For that, the
> >> > register allocator has a workaround to ensure that it's assigned an
> even
> >> > register on SNB.  One of the early patches in my series replaces the
> >> broken
> >> > gen4.5-6 PLN fall-back (it didn't work in SIMD16 because it assumed
> the
> >> > wrong register layout for coordinates) with an assert and Jenkins is
> just
> >> > fine with the assert.
> >> >
> >>
> >> I know the conditions for the non-PLN fall-back are not typically
> >> encountered on Gen5-6, but it's still valid IR, so this implementation
> >> of writes_accumulator_implicitly() relies on the behavior of the
> >> register allocator, the NIR-to-i965 translation pass and the rest of the
> >> visitor being exactly what you expect in every possible codepath for
> >> correctness.
> >
> >
> > It's already relied on for correctness because the LINE+MAC fallback that
> > we had before is wrong in SIMD16 (see patch 33).
> >
>
> Yes, I had to do basically the same fix on my SIMD32 branch in order for
> it to pass piglit on Gen4-5.  Still hardly an excuse to make more code
> rely on the same broken assumption which wasn't doing it before.
>

I'm very confused by what you mean here.  The old code assumed the LINE+MAC
layout register layout which is different from the PLN layout.  This means
that the fallback is broken for g45-gen6 because the hardware supports PLN
so we use the PLN layout which arranges thing x1y1x2y2 but the LINE+MAC
fallback assumes x1x2y1y2.  The reason why we need so much more code is
because we have to split it all the way down to 8-wide in order to handle
the PLN layout with LINE+MAC.  Looking at the code in your branch, I have
no idea how it could possibly work correctly.

My point is that the odd register fallback on g4x-gen6 was never used and
didn't work even if it was.

--Jason

> >
> >> That wasn't the case in my original patch, nor in your
> >> series after PATCHv2 33 because this inaccuracy actually becomes a
> >> problem.  Instead of introducing code which we know is dubiously
> >> correct, CC'ing mesa-stable, and then fixing it immediately, why don't
> >> we just do the obviously correct thing from the start?
> >>
> >
> > Sure, we can squash them together if you really want.  But if you're
> > concerned about this restriction being valid in live code, then 33/53
> also
> > needs to go to stable.  I'm fine with that if you're unconvinced by my
> > argument that LINTERP with an odd coordinate never occurs.
> >
> > --Jason
> >
> >
> >> >
> >> >> but why make things more inaccurate than
> >> >> the original only to revert back to a devinfo->gen based check in
> >> >> PATCHv2 33?
> >> >>
> >> >
> >> > See above.
> >> >
> >> >
> >> >> I think I'd squash the last hunk of PATCHv2 33 into this one which
> would
> >> >> give you something functionally equivalent to v1 but updated to
> handle
> >> >> Gen11 correctly.
> >> >>
> >> >> >  }
> >> >> >
> >> >> >  bool
> >> >> > --
> >> >> > 2.5.0.400.gff86faf
> >> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-stable/attachments/20180526/ec99e1f9/attachment.html>