[Mesa-dev] [PATCH v2 06/20] i965/fs: fix dst stride in IVB/BYT type conversions

Thu Jan 19 23:00:49 UTC 2017

Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:

> On Wed, 2017-01-18 at 12:44 -0800, Francisco Jerez wrote:
>> Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>> 
>> > On Tue, 2017-01-17 at 13:26 -0800, Francisco Jerez wrote:
>> > > Samuel Iglesias Gonsálvez <siglesias at igalia.com> writes:
>> > > 
>> > > > From: "Juan A. Suarez Romero" <jasuarez at igalia.com>
>> > > > 
>> > > > When converting a DF to F, we set dst stride to 2, to fulfil
>> > > > alignment
>> > > > restrictions.
>> > > > 
>> > > > But in IVB/BYT, this is not necessary, as each DF conversion
>> > > > already
>> > > > writes 2 F, the first one the real value, and the second one a
>> > > > 0.
>> > > > That
>> > > > is, IVB/BYT already set stride = 2 implicitly, so we must set
>> > > > it to
>> > > > 1
>> > > > explicitly to avoid ending up with stride = 4.
>> > > > 
>> > > > v2:
>> > > > - Fix typo (Matt)
>> > > > ---
>> > > >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10 ++++++++++
>> > > >  1 file changed, 10 insertions(+)
>> > > > 
>> > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > > > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > > > index 45881e3ec95..487f2e90224 100644
>> > > > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > > > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > > > @@ -1629,6 +1629,16 @@ fs_generator::generate_code(const cfg_t
>> > > > *cfg, int dispatch_width)
>> > > >  		inst->src[i].type != BRW_REGISTER_TYPE_UD ||
>> > > >  		!inst->src[i].negate);
>> > > >        }
>> > > > +      /* When converting from DF->F, we set destination's
>> > > > stride
>> > > > as 2 as an
>> > > > +       * alignment requirement. But in IVB/BYT, each DF
>> > > > implicitly
>> > > > writes 2 F,
>> > > > +       * being the first one the converted value. So we don't
>> > > > need
>> > > > to
>> > > > +       * explicitly set stride 2, but 1.
>> > > > +       */
>> > > > +      if (devinfo->gen == 7 && !devinfo->is_haswell &&
>> > > > +          type_sz(inst->src[0].type) > type_sz(inst-
>> > > > >dst.type)) {
>> > > 
>> > > This should be exec_type_size(inst) rather than
>> > > type_sz(inst->src[0].type).
>> > > 
>> > 
>> > Actually it is going to be the same as this is going to catch only
>> > DF
>> > -> 32-bit data type conversions. But yeah, you are right.
>> > 
>> 
>> Are you sure?  AFAICT the few lines of this patch were going to be
>> executed for *all* instructions, you didn't even have a check for the
>> execution type of the instruction being FP64...
>> 
>
> Actually this is an error, I fixed it locally before sending this email
>  by checking if it is a DF to 32-bit data type conversion and forgot to
> mention it.
>
Yes, not checking for FP64 was an error, but applying this workaround
regardless of the instruction opcode seemed reasonable, that way you'd
avoid having to take this into account for every combination of double
precision opcode (apparently two in this series) and destination type
you end up using.

>> > > > +         assert(inst->dst.stride == 2 || inst->dst.stride ==
>> > > > 1);
>> > > > +         inst->dst.stride = 1;
>> > > > +      }
>> > > 
>> > > This is modifying the IR, please don't.
>> > > 
>> > 
>> > Right, I am going to do the change in brw_eu_emit.c inside the
>> > brw_MOV() function that Matt added in other patch and also when
>> > emitting the MOV indirect.
>> > 
>> > > Also I don't think the above has the same semantics as a
>> > > destination
>> > > region with stride 2...  AFAIUI IVB will just write garbage into
>> > > the
>> > > odd
>> > > channels when the destination type is narrower than a DF which is
>> > > really
>> > > not what a strided move is supposed to do.  If that's the case it
>> > > would
>> > > probably be safer to add a new F64TO32 virtual opcode for type
>> > > conversions and assert(inst->dst.stride == 1) here...
>> > > 
>> > 
>> > I think adding a virtual opcode for this change is likely too much.
>> > I
>> > will keep the aforementioned changes. However, I don't have a
>> > strong
>> > opinion against it: if you prefer the virtual opcode, we can add it
>> > now
>> > or even later as a follow-up patch.
>> > 
>> 
>> TBH I'm not particularly fond of the idea of adding a virtual
>> conversion
>> opcode for IVB either, but if you don't, I hope you have some other
>> plan
>> to deal with the subtle kind of breakage this could cause...  E.g.
>> some
>> future, seemingly unrelated and obviously correct (because it's the
>> codegen pass that will be breaking a contract with this patch)
>> optimizer
>> improvement could rely on the (seemingly obvious) assumption that
>> strided destination regions don't corrupt any of the skipped
>> components
>> (honestly I'm not fully convinced that we don't rely on this
>> already),
>> which would probably work fine except on IVB for this infrequently
>> used
>> feature, and even then the kind of failure mode is likely to be
>> non-deterministic and may not be caught by the simplest test-cases.
>> 
>
> I see the problem now, thanks for the explanation. However this problem
> affects all generations that support doubles as this is not specific to
> IVB. From the Broadwell PRM, 3D Media GPGPU, "Double Precision Float to
> Single Precision Float":
>
>   "The upper Dword of every Qword will be written with undefined value 
>    when converting DF to F."
>

Shoot, that sucks, but makes the matter even more compelling.

> What makes special to IVB is to use stride 1 instead of 2 in the
> destination region.
>
> Then, the virtual conversion code makes more sense now to me because
> this way we identify the cases where we have corrupted skipped
> components. 
>
>> Some alternatives to adding a virtual opcode:
>> 
>>  - Continue using regular moves with destination stride 2 as you're
>>    doing here, but call the contents of the skipped components
>> undefined
>>    after the instruction is executed.  Add some simple code to the FS
>>    validator pass to warn us if we ever break this assumption -- The
>>    most naive implementation of this could just make sure that the
>>    destination VGRF of any strided narrowing conversion is only ever
>>    written once on IVB (which could conceivably give false positives
>> but
>>    never false negatives).
>> 
>>  - Same as above, but instead of the validation pass, write a
>>    legalization pass to fix up invalid strided conversions to use a
>>    temporary instead of the real destination which would then copied
>>    into the real register (this may have benefits of its own because
>> the
>>    same logic could potentially by used to get rid of an amount of
>> cruft
>>    from the compiler back-end intended to address execution type
>>    alignment restrictions of the destination and sources of various
>>    instructions, which gets particularly annoying on CHV/BXT).
>> 
>
> Actually, such legalization is done for d2x conversions in
> fs_visitor::lower_d2x(), then the added virtual opcode will identify
> the temporary conversion to avoid the problem with the assumption done
> by future optimizations that you mentioned.
>
> I have a couple of patches, one to add the virtual opcode for the
> conversions done by the d2x pass [0] and another virtual opcode to fix
> a couple of mov indirects we generated [1].
>

I don't think the FS_OPCODE_FROM_DOUBLE opcode as you defined it in [0]
would substantially improves the situation, because you still use a
strided 32-bit destination so you'll be breaking the exact same
assumption MOV instructions were breaking.

That said, because you have a legalization pass in place already, the
virtual opcode seems unnecessary to isolate the optimizer from this
hardware bogosity.  Let's just make sure that the pass kicks in in all
cases where this could be a problem, e.g. for any double-precision
instructions where the destination type is narrower than the execution
type.

> As we are blocking the release and there more patches for review,
> another possibility is to land this patch [2] (replacing this patch)
> and then fix it later to not block Mesa 17.0 release more than needed.
>
> What do you think?
>
> Sam
>
>
> [0] https://github.com/Igalia/mesa/commit/2e98b6238996756ee489adcf98377
> d9314de2cc9
>    As we have {VEC4,FS}_OPCODE_FROM_DOUBLE, I renamed them later to
> SHADER_OPCODE_FROM_DOUBLE https://github.com/Igalia/mesa/commit/57e2a2d
> 1fbbe2464a42f590589826fe46a341e9d
>
> [1] https://github.com/Igalia/mesa/commit/2307ace353423713c5a9d193fc200
> 6ca738a841e
>
> [2] Notice this is a temporary patch before doing actual one: https://g
> ithub.com/samuelig/mesa/commit/61cd28175178a0c240be1492f9e535c30d908c77
>
>
> P.S: you can see changes [0] and [1] together in our v3 development
> branch:
>
> https://github.com/Igalia/mesa/tree/i965-fp64-gen7-ivb-scalar-vec4-rc3
>
>> > Sam
>> > 
>> > > >        dst = brw_reg_from_fs_reg(compiler->devinfo, inst,
>> > > >                                  &inst->dst, compressed);
>> > > >  
>> > > > -- 
>> > > > 2.11.0
>> > > > 
>> > > > _______________________________________________
>> > > > mesa-dev mailing list
>> > > > mesa-dev at lists.freedesktop.org
>> > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170119/62a0c39f/attachment.sig>