[Mesa-dev] [PATCH v3 01/42] intel/compiler: handle conversions between int and half-float on atom

Wed Jan 16 07:18:12 UTC 2019

Hi Curro, thanks for looking into this, I have a few comments inline:

On Tue, 2019-01-15 at 14:34 -0800, Francisco Jerez wrote:
> Iago Toral Quiroga <itoral at igalia.com> writes:
> 
> > v2: adapted to work with the new regioning lowering pass
> > 
> > Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com> (v1)
> > ---
> >  src/intel/compiler/brw_ir_fs.h | 33 ++++++++++++++++++++++++++--
> > -----
> >  1 file changed, 26 insertions(+), 7 deletions(-)
> > 
> > diff --git a/src/intel/compiler/brw_ir_fs.h
> > b/src/intel/compiler/brw_ir_fs.h
> > index 3c23fb375e4..ba4d6a95720 100644
> > --- a/src/intel/compiler/brw_ir_fs.h
> > +++ b/src/intel/compiler/brw_ir_fs.h
> > @@ -497,9 +497,10 @@ is_unordered(const fs_inst *inst)
> >  }
> >  
> >  /**
> > - * Return whether the following regioning restriction applies to
> > the specified
> > - * instruction.  From the Cherryview PRM Vol 7. "Register Region
> > - * Restrictions":
> > + * Return whether one of the the following regioning restrictions
> > apply to the
> > + * specified instruction.
> > + *
> > + * From the Cherryview PRM Vol 7. "Register Region Restrictions":
> >   *
> >   * "When source or destination datatype is 64b or operation is
> > integer DWord
> >   *  multiply, regioning in Align1 must follow these rules:
> > @@ -508,6 +509,14 @@ is_unordered(const fs_inst *inst)
> >   *  2. Regioning must ensure Src.Vstride = Src.Width *
> > Src.Hstride.
> >   *  3. Source and Destination offset must be the same, except the
> > case of
> >   *     scalar source."
> > + *
> > + * From the Cherryview PRM Vol 7. "Register Region Restrictions":
> > + *
> > + *    "Conversion between Integer and HF (Half Float) must be
> > DWord
> > + *     aligned and strided by a DWord on the destination."
> > + *
> > + *    The same restriction is listed for other hardware platforms,
> > however,
> > + *    empirical testing suggests that only atom platforms are
> > affected.
> >   */
> >  static inline bool
> >  has_dst_aligned_region_restriction(const gen_device_info *devinfo,
> > @@ -518,10 +527,20 @@ has_dst_aligned_region_restriction(const
> > gen_device_info *devinfo,
> >           (inst->opcode == BRW_OPCODE_MUL || inst->opcode ==
> > BRW_OPCODE_MAD);
> >  
> >     if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
> > -       (type_sz(exec_type) == 4 && is_int_multiply))
> > -      return devinfo->is_cherryview ||
> > gen_device_info_is_9lp(devinfo);
> > -   else
> > -      return false;
> > +       (type_sz(exec_type) == 4 && is_int_multiply)) {
> > +      if (devinfo->is_cherryview ||
> > gen_device_info_is_9lp(devinfo))
> > +         return true;
> > +   }
> > +
> > +   const bool dst_type_is_hf = inst->dst.type ==
> > BRW_REGISTER_TYPE_HF;
> > +   const bool exec_type_is_hf = exec_type == BRW_REGISTER_TYPE_HF;
> > +   if ((dst_type_is_hf &&
> > !brw_reg_type_is_floating_point(exec_type)) ||
> > +       (exec_type_is_hf && !brw_reg_type_is_floating_point(inst-
> > >dst.type))) {
> > +      if (devinfo->is_cherryview ||
> > gen_device_info_is_9lp(devinfo))
> > +         return true;
> > +   }
> 
> While looking into this closely, I'm seeing substantial divergence
> between the behavior of the simulator, the hardware docs, and the
> restriction this is implementing...  The docs are certainly
> inconsistent
> about how and where this should be handled.
> 
> I'm suspecting that this restriction is more similar in nature to the
> one referred to in the regioning lowering pass as
> "is_narrowing_conversion", rather than the one handled by
> has_dst_aligned_region_restriction().  Probably we don't need to
> change
> this function nor the regioning pass for it to be honored, because
> that
> restriction is already implemented.  I have a feeling that the reason
> for this may be that the 16-bit pipeline lacks the ability to handle
> conversions from or to half-float, so the execution type is
> implicitly
> promoted to the matching (integer or floating-point) 32-bit type
> where
> any HF conversion would be needed.  And on those the usual alignment
> restriction of the destination to a larger execution type
> applies.  From
> the hardware docs for CHV *only*:
> 
> > When single precision and half precision floats are mixed between
> > source operands or between source and destination operand. In such
> > cases, single precision float is the execution datatype.
> 
> This would mean that an "add dst:f, src:hf, src:hf" is really
> computed
> with single precision (!).
> 
> The restriction you're quoting seems to be the following:
> 
> > BDW+
> > 
> > Conversion between Integer and HF (Half Float) must be DWord-
> > aligned
> > and strided by a DWord on the destination.
> > 
> > // Example:
> > add (8) r10.0<2>:hf r11.0<8;8,1>:w r12.0<8;8,1>:w
> > // Destination stride must be 2.
> > mov (8) r10.0<2>:w r11.0<8;8,1>:hf
> > // Destination stride must be 2.
> 
> However that restriction is apparently overriden on *most* projects
> except for BDW (where you aren't applying any restriction at all) by
> the
> following:

That's interesting and doesn't seem to match with my empirical
experience. Going by the results of the available CTS tests, it is
clear that this is required for atom (at least Braswell), we have
conversion tests between integer and half-float types that fail if we
don't honor this restriction. On the other hand, we haven't seen any
test failures on any other platform that we have tested (BDW, SKL,
KBL).

> > Project:  CHV, SKL+
> > 
> > There is a relaxed alignment rule for word destinations. When the
> > destination type is word (UW, W, HF), destination data types can be
> > aligned to either the lowest word or the second lowest word of the
> > execution channel. This means the destination data words can be
> > either
> > all in the even word locations or all in the odd word locations.
> > 
> > // Example:
> > add (8)  r10.0<2>:hf r11.0<8;8,1>:f r12.0<8;8,1>:hf
> > add (8)  r10.1<2>:hf r11.0<8;8,1>:f r12.0<8;8,1>:hf
> > 
> > // Note: The destination offset may be .0 or .1 although the
> > // destination subregister is required to be aligned to execution
> > // datatype.
> 
> Which seems somewhat nonsensical to me, because interpreted strictly
> it
> would rule out the possibility for any packed 16-bit operations...

I think this only applies in the context of mixed float and half-float
operations. We do emit packed half-float operations and we have not
observed any problems with that on any platform. Which makes me think
that when only half-float is involved, the execution type may not be
promoted to 32-bit.

> To make the matter even crazier, the Gen8 simulator enforces the
> restrictions on BDW *only*,

Yeah, this is really odd, because this is definitely needed on Braswell
too.

>  in fact any time that there is a conversion
> between half float and any other type, which is *almost* equivalent
> to
> the rule you'd quoted above except for more stringent requirements on
> the destination sub-register alignment of FP32 -> FP16 conversions
> that
> apply to BDW only.
> 
> Whatever of the above is going on, this patch doesn't give us what we
> want.  I think we want to drop it and fix get_exec_type() to promote
> the
> execution type correctly.  I'll reply with a patch to do that in a
> minute.

I understand that you mean promoting the execution type only when there
are mixed types involved and not when the instruction is exclusively
half-float on all its operands and destinations, right?

> > +
> > +   return false;
> >  }
> >  
> >  #endif
> > -- 
> > 2.17.1
> > 
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev