[Mesa-dev] [PATCH v2 06/20] i965/fs: fix dst stride in IVB/BYT type conversions

Thu Jan 19 12:50:28 UTC 2017

On Wed, 2017-01-18 at 12:44 -0800, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> 
> > On Tue, 2017-01-17 at 13:26 -0800, Francisco Jerez wrote:
> > > Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
> > > 
> > > > From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
> > > > 
> > > > When converting a DF to F, we set dst stride to 2, to fulfil
> > > > alignment
> > > > restrictions.
> > > > 
> > > > But in IVB/BYT, this is not necessary, as each DF conversion
> > > > already
> > > > writes 2 F, the first one the real value, and the second one a
> > > > 0.
> > > > That
> > > > is, IVB/BYT already set stride = 2 implicitly, so we must set
> > > > it to
> > > > 1
> > > > explicitly to avoid ending up with stride = 4.
> > > > 
> > > > v2:
> > > > - Fix typo (Matt)
> > > > ---
> > > >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > > 
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > index 45881e3ec95..487f2e90224 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > @@ -1629,6 +1629,16 @@ fs_generator::generate_code(const cfg_t
> > > > *cfg, int dispatch_width)
> > > >  		inst->src[i].type != BRW_REGISTER_TYPE_UD ||
> > > >  		!inst->src[i].negate);
> > > >        }
> > > > +      /* When converting from DF->F, we set destination's
> > > > stride
> > > > as 2 as an
> > > > +       * alignment requirement. But in IVB/BYT, each DF
> > > > implicitly
> > > > writes 2 F,
> > > > +       * being the first one the converted value. So we don't
> > > > need
> > > > to
> > > > +       * explicitly set stride 2, but 1.
> > > > +       */
> > > > +      if (devinfo->gen == 7 && !devinfo->is_haswell &&
> > > > +          type_sz(inst->src[0].type) > type_sz(inst-
> > > > >dst.type)) {
> > > 
> > > This should be exec_type_size(inst) rather than
> > > type_sz(inst->src[0].type).
> > > 
> > 
> > Actually it is going to be the same as this is going to catch only
> > DF
> > -> 32-bit data type conversions. But yeah, you are right.
> > 
> 
> Are you sure?  AFAICT the few lines of this patch were going to be
> executed for *all* instructions, you didn't even have a check for the
> execution type of the instruction being FP64...
> 

Actually this is an error, I fixed it locally before sending this email
 by checking if it is a DF to 32-bit data type conversion and forgot to
mention it.

> > > > +         assert(inst->dst.stride == 2 || inst->dst.stride ==
> > > > 1);
> > > > +         inst->dst.stride = 1;
> > > > +      }
> > > 
> > > This is modifying the IR, please don't.
> > > 
> > 
> > Right, I am going to do the change in brw_eu_emit.c inside the
> > brw_MOV() function that Matt added in other patch and also when
> > emitting the MOV indirect.
> > 
> > > Also I don't think the above has the same semantics as a
> > > destination
> > > region with stride 2...  AFAIUI IVB will just write garbage into
> > > the
> > > odd
> > > channels when the destination type is narrower than a DF which is
> > > really
> > > not what a strided move is supposed to do.  If that's the case it
> > > would
> > > probably be safer to add a new F64TO32 virtual opcode for type
> > > conversions and assert(inst->dst.stride == 1) here...
> > > 
> > 
> > I think adding a virtual opcode for this change is likely too much.
> > I
> > will keep the aforementioned changes. However, I don't have a
> > strong
> > opinion against it: if you prefer the virtual opcode, we can add it
> > now
> > or even later as a follow-up patch.
> > 
> 
> TBH I'm not particularly fond of the idea of adding a virtual
> conversion
> opcode for IVB either, but if you don't, I hope you have some other
> plan
> to deal with the subtle kind of breakage this could cause...  E.g.
> some
> future, seemingly unrelated and obviously correct (because it's the
> codegen pass that will be breaking a contract with this patch)
> optimizer
> improvement could rely on the (seemingly obvious) assumption that
> strided destination regions don't corrupt any of the skipped
> components
> (honestly I'm not fully convinced that we don't rely on this
> already),
> which would probably work fine except on IVB for this infrequently
> used
> feature, and even then the kind of failure mode is likely to be
> non-deterministic and may not be caught by the simplest test-cases.
> 

I see the problem now, thanks for the explanation. However this problem
affects all generations that support doubles as this is not specific to
IVB. From the Broadwell PRM, 3D Media GPGPU, "Double Precision Float to
Single Precision Float":

  "The upper Dword of every Qword will be written with undefined value 
   when converting DF to F."

What makes special to IVB is to use stride 1 instead of 2 in the
destination region.

Then, the virtual conversion code makes more sense now to me because
this way we identify the cases where we have corrupted skipped
components. 

> Some alternatives to adding a virtual opcode:
> 
>  - Continue using regular moves with destination stride 2 as you're
>    doing here, but call the contents of the skipped components
> undefined
>    after the instruction is executed.  Add some simple code to the FS
>    validator pass to warn us if we ever break this assumption -- The
>    most naive implementation of this could just make sure that the
>    destination VGRF of any strided narrowing conversion is only ever
>    written once on IVB (which could conceivably give false positives
> but
>    never false negatives).
> 
>  - Same as above, but instead of the validation pass, write a
>    legalization pass to fix up invalid strided conversions to use a
>    temporary instead of the real destination which would then copied
>    into the real register (this may have benefits of its own because
> the
>    same logic could potentially by used to get rid of an amount of
> cruft
>    from the compiler back-end intended to address execution type
>    alignment restrictions of the destination and sources of various
>    instructions, which gets particularly annoying on CHV/BXT).
> 

Actually, such legalization is done for d2x conversions in
fs_visitor::lower_d2x(), then the added virtual opcode will identify
the temporary conversion to avoid the problem with the assumption done
by future optimizations that you mentioned.

I have a couple of patches, one to add the virtual opcode for the
conversions done by the d2x pass [0] and another virtual opcode to fix
a couple of mov indirects we generated [1].

As we are blocking the release and there more patches for review,
another possibility is to land this patch [2] (replacing this patch)
and then fix it later to not block Mesa 17.0 release more than needed.

What do you think?

Sam

[0] https://github.com/Igalia/mesa/commit/2e98b6238996756ee489adcf98377
d9314de2cc9
   As we have {VEC4,FS}_OPCODE_FROM_DOUBLE, I renamed them later to
SHADER_OPCODE_FROM_DOUBLE https://github.com/Igalia/mesa/commit/57e2a2d
1fbbe2464a42f590589826fe46a341e9d

[1] https://github.com/Igalia/mesa/commit/2307ace353423713c5a9d193fc200
6ca738a841e

[2] Notice this is a temporary patch before doing actual one: https://g
ithub.com/samuelig/mesa/commit/61cd28175178a0c240be1492f9e535c30d908c77

P.S: you can see changes [0] and [1] together in our v3 development
branch:

https://github.com/Igalia/mesa/tree/i965-fp64-gen7-ivb-scalar-vec4-rc3

> > Sam
> > 
> > > >        dst = brw_reg_from_fs_reg(compiler->devinfo, inst,
> > > >                                  &inst->dst, compressed);
> > > >  
> > > > -- 
> > > > 2.11.0
> > > > 
> > > > _______________________________________________
> > > > mesa-dev mailing list
> > > > mesa-dev at lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev