[Mesa-dev] [PATCH v2 06/20] i965/fs: fix dst stride in IVB/BYT type conversions

Fri Jan 20 06:53:40 UTC 2017

On Thu, 2017-01-19 at 15:00 -0800, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> 
> > On Wed, 2017-01-18 at 12:44 -0800, Francisco Jerez wrote:
> > > Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> > > 
> > > > On Tue, 2017-01-17 at 13:26 -0800, Francisco Jerez wrote:
> > > > > Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> > > > > 
> > > > > > From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
> > > > > > 
> > > > > > When converting a DF to F, we set dst stride to 2, to
> > > > > > fulfil
> > > > > > alignment
> > > > > > restrictions.
> > > > > > 
> > > > > > But in IVB/BYT, this is not necessary, as each DF
> > > > > > conversion
> > > > > > already
> > > > > > writes 2 F, the first one the real value, and the second
> > > > > > one a
> > > > > > 0.
> > > > > > That
> > > > > > is, IVB/BYT already set stride = 2 implicitly, so we must
> > > > > > set
> > > > > > it to
> > > > > > 1
> > > > > > explicitly to avoid ending up with stride = 4.
> > > > > > 
> > > > > > v2:
> > > > > > - Fix typo (Matt)
> > > > > > ---
> > > > > >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10
> > > > > > ++++++++++
> > > > > >  1 file changed, 10 insertions(+)
> > > > > > 
> > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > > > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > > > index 45881e3ec95..487f2e90224 100644
> > > > > > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > > > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > > > @@ -1629,6 +1629,16 @@ fs_generator::generate_code(const
> > > > > > cfg_t
> > > > > > *cfg, int dispatch_width)
> > > > > >  		inst->src[i].type != BRW_REGISTER_TYPE_UD
> > > > > > ||
> > > > > >  		!inst->src[i].negate);
> > > > > >        }
> > > > > > +      /* When converting from DF->F, we set destination's
> > > > > > stride
> > > > > > as 2 as an
> > > > > > +       * alignment requirement. But in IVB/BYT, each DF
> > > > > > implicitly
> > > > > > writes 2 F,
> > > > > > +       * being the first one the converted value. So we
> > > > > > don't
> > > > > > need
> > > > > > to
> > > > > > +       * explicitly set stride 2, but 1.
> > > > > > +       */
> > > > > > +      if (devinfo->gen == 7 && !devinfo->is_haswell &&
> > > > > > +          type_sz(inst->src[0].type) > type_sz(inst-
> > > > > > > dst.type)) {
> > > > > 
> > > > > This should be exec_type_size(inst) rather than
> > > > > type_sz(inst->src[0].type).
> > > > > 
> > > > 
> > > > Actually it is going to be the same as this is going to catch
> > > > only
> > > > DF
> > > > -> 32-bit data type conversions. But yeah, you are right.
> > > > 
> > > 
> > > Are you sure?  AFAICT the few lines of this patch were going to
> > > be
> > > executed for *all* instructions, you didn't even have a check for
> > > the
> > > execution type of the instruction being FP64...
> > > 
> > 
> > Actually this is an error, I fixed it locally before sending this
> > email
> >  by checking if it is a DF to 32-bit data type conversion and
> > forgot to
> > mention it.
> > 
> 
> Yes, not checking for FP64 was an error, but applying this workaround
> regardless of the instruction opcode seemed reasonable, that way
> you'd
> avoid having to take this into account for every combination of
> double
> precision opcode (apparently two in this series) and destination type
> you end up using.
> 

Right.

> > > > > > +         assert(inst->dst.stride == 2 || inst->dst.stride
> > > > > > ==
> > > > > > 1);
> > > > > > +         inst->dst.stride = 1;
> > > > > > +      }
> > > > > 
> > > > > This is modifying the IR, please don't.
> > > > > 
> > > > 
> > > > Right, I am going to do the change in brw_eu_emit.c inside the
> > > > brw_MOV() function that Matt added in other patch and also when
> > > > emitting the MOV indirect.
> > > > 
> > > > > Also I don't think the above has the same semantics as a
> > > > > destination
> > > > > region with stride 2...  AFAIUI IVB will just write garbage
> > > > > into
> > > > > the
> > > > > odd
> > > > > channels when the destination type is narrower than a DF
> > > > > which is
> > > > > really
> > > > > not what a strided move is supposed to do.  If that's the
> > > > > case it
> > > > > would
> > > > > probably be safer to add a new F64TO32 virtual opcode for
> > > > > type
> > > > > conversions and assert(inst->dst.stride == 1) here...
> > > > > 
> > > > 
> > > > I think adding a virtual opcode for this change is likely too
> > > > much.
> > > > I
> > > > will keep the aforementioned changes. However, I don't have a
> > > > strong
> > > > opinion against it: if you prefer the virtual opcode, we can
> > > > add it
> > > > now
> > > > or even later as a follow-up patch.
> > > > 
> > > 
> > > TBH I'm not particularly fond of the idea of adding a virtual
> > > conversion
> > > opcode for IVB either, but if you don't, I hope you have some
> > > other
> > > plan
> > > to deal with the subtle kind of breakage this could
> > > cause...  E.g.
> > > some
> > > future, seemingly unrelated and obviously correct (because it's
> > > the
> > > codegen pass that will be breaking a contract with this patch)
> > > optimizer
> > > improvement could rely on the (seemingly obvious) assumption that
> > > strided destination regions don't corrupt any of the skipped
> > > components
> > > (honestly I'm not fully convinced that we don't rely on this
> > > already),
> > > which would probably work fine except on IVB for this
> > > infrequently
> > > used
> > > feature, and even then the kind of failure mode is likely to be
> > > non-deterministic and may not be caught by the simplest test-
> > > cases.
> > > 
> > 
> > I see the problem now, thanks for the explanation. However this
> > problem
> > affects all generations that support doubles as this is not
> > specific to
> > IVB. From the Broadwell PRM, 3D Media GPGPU, "Double Precision
> > Float to
> > Single Precision Float":
> > 
> >   "The upper Dword of every Qword will be written with
> > undefined value 
> >    when converting DF to F."
> > 
> 
> Shoot, that sucks, but makes the matter even more compelling.
> 
> > What makes special to IVB is to use stride 1 instead of 2 in the
> > destination region.
> > 
> > Then, the virtual conversion code makes more sense now to me
> > because
> > this way we identify the cases where we have corrupted skipped
> > components. 
> > 
> > > Some alternatives to adding a virtual opcode:
> > > 
> > >  - Continue using regular moves with destination stride 2 as
> > > you're
> > >    doing here, but call the contents of the skipped components
> > > undefined
> > >    after the instruction is executed.  Add some simple code to
> > > the FS
> > >    validator pass to warn us if we ever break this assumption --
> > > The
> > >    most naive implementation of this could just make sure that
> > > the
> > >    destination VGRF of any strided narrowing conversion is only
> > > ever
> > >    written once on IVB (which could conceivably give false
> > > positives
> > > but
> > >    never false negatives).
> > > 
> > >  - Same as above, but instead of the validation pass, write a
> > >    legalization pass to fix up invalid strided conversions to use
> > > a
> > >    temporary instead of the real destination which would then
> > > copied
> > >    into the real register (this may have benefits of its own
> > > because
> > > the
> > >    same logic could potentially by used to get rid of an amount
> > > of
> > > cruft
> > >    from the compiler back-end intended to address execution type
> > >    alignment restrictions of the destination and sources of
> > > various
> > >    instructions, which gets particularly annoying on CHV/BXT).
> > > 
> > 
> > Actually, such legalization is done for d2x conversions in
> > fs_visitor::lower_d2x(), then the added virtual opcode will
> > identify
> > the temporary conversion to avoid the problem with the assumption
> > done
> > by future optimizations that you mentioned.
> > 
> > I have a couple of patches, one to add the virtual opcode for the
> > conversions done by the d2x pass [0] and another virtual opcode to
> > fix
> > a couple of mov indirects we generated [1].
> > 
> 
> I don't think the FS_OPCODE_FROM_DOUBLE opcode as you defined it in
> [0]
> would substantially improves the situation, because you still use a
> strided 32-bit destination so you'll be breaking the exact same
> assumption MOV instructions were breaking.
>
> That said, because you have a legalization pass in place already, the
> virtual opcode seems unnecessary to isolate the optimizer from this
> hardware bogosity.  Let's just make sure that the pass kicks in in
> all
> cases where this could be a problem, e.g. for any double-precision
> instructions where the destination type is narrower than the
> execution
> type.
> 

OK. Thanks!

Sam

> > As we are blocking the release and there more patches for review,
> > another possibility is to land this patch [2] (replacing this
> > patch)
> > and then fix it later to not block Mesa 17.0 release more than
> > needed.
> > 
> > What do you think?
> > 
> > Sam
> > 
> > 
> > [0] https://github.com/Igalia/mesa/commit/2e98b6238996756ee489adcf9
> > 8377
> > d9314de2cc9
> >    As we have {VEC4,FS}_OPCODE_FROM_DOUBLE, I renamed them later to
> > SHADER_OPCODE_FROM_DOUBLE https://github.com/Igalia/mesa/commit/57e
> > 2a2d
> > 1fbbe2464a42f590589826fe46a341e9d
> > 
> > [1] https://github.com/Igalia/mesa/commit/2307ace353423713c5a9d193f
> > c200
> > 6ca738a841e
> > 
> > [2] Notice this is a temporary patch before doing actual one:
> > https://g
> > ithub.com/samuelig/mesa/commit/61cd28175178a0c240be1492f9e535c30d90
> > 8c77
> > 
> > 
> > P.S: you can see changes [0] and [1] together in our v3 development
> > branch:
> > 
> > https://github.com/Igalia/mesa/tree/i965-fp64-gen7-ivb-scalar-vec4-
> > rc3
> > 
> > > > Sam
> > > > 
> > > > > >        dst = brw_reg_from_fs_reg(compiler->devinfo, inst,
> > > > > >                                  &inst->dst, compressed);
> > > > > >  
> > > > > > -- 
> > > > > > 2.11.0
> > > > > > 
> > > > > > _______________________________________________
> > > > > > mesa-dev mailing list
> > > > > > mesa-dev at lists.freedesktop.org
> > > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170120/a39d2deb/attachment.sig>