[Mesa-dev] [PATCH v3 01/42] intel/compiler: handle conversions between int and half-float on atom

Tue Jan 15 22:34:02 UTC 2019

Iago Toral Quiroga <itoral at igalia.com> writes:

> v2: adapted to work with the new regioning lowering pass
>
> Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com> (v1)
> ---
>  src/intel/compiler/brw_ir_fs.h | 33 ++++++++++++++++++++++++++-------
>  1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
> index 3c23fb375e4..ba4d6a95720 100644
> --- a/src/intel/compiler/brw_ir_fs.h
> +++ b/src/intel/compiler/brw_ir_fs.h
> @@ -497,9 +497,10 @@ is_unordered(const fs_inst *inst)
>  }
>  
>  /**
> - * Return whether the following regioning restriction applies to the specified
> - * instruction.  From the Cherryview PRM Vol 7. "Register Region
> - * Restrictions":
> + * Return whether one of the the following regioning restrictions apply to the
> + * specified instruction.
> + *
> + * From the Cherryview PRM Vol 7. "Register Region Restrictions":
>   *
>   * "When source or destination datatype is 64b or operation is integer DWord
>   *  multiply, regioning in Align1 must follow these rules:
> @@ -508,6 +509,14 @@ is_unordered(const fs_inst *inst)
>   *  2. Regioning must ensure Src.Vstride = Src.Width * Src.Hstride.
>   *  3. Source and Destination offset must be the same, except the case of
>   *     scalar source."
> + *
> + * From the Cherryview PRM Vol 7. "Register Region Restrictions":
> + *
> + *    "Conversion between Integer and HF (Half Float) must be DWord
> + *     aligned and strided by a DWord on the destination."
> + *
> + *    The same restriction is listed for other hardware platforms, however,
> + *    empirical testing suggests that only atom platforms are affected.
>   */
>  static inline bool
>  has_dst_aligned_region_restriction(const gen_device_info *devinfo,
> @@ -518,10 +527,20 @@ has_dst_aligned_region_restriction(const gen_device_info *devinfo,
>           (inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MAD);
>  
>     if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
> -       (type_sz(exec_type) == 4 && is_int_multiply))
> -      return devinfo->is_cherryview || gen_device_info_is_9lp(devinfo);
> -   else
> -      return false;
> +       (type_sz(exec_type) == 4 && is_int_multiply)) {
> +      if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))
> +         return true;
> +   }
> +
> +   const bool dst_type_is_hf = inst->dst.type == BRW_REGISTER_TYPE_HF;
> +   const bool exec_type_is_hf = exec_type == BRW_REGISTER_TYPE_HF;
> +   if ((dst_type_is_hf && !brw_reg_type_is_floating_point(exec_type)) ||
> +       (exec_type_is_hf && !brw_reg_type_is_floating_point(inst->dst.type))) {
> +      if (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))
> +         return true;
> +   }

While looking into this closely, I'm seeing substantial divergence
between the behavior of the simulator, the hardware docs, and the
restriction this is implementing...  The docs are certainly inconsistent
about how and where this should be handled.

I'm suspecting that this restriction is more similar in nature to the
one referred to in the regioning lowering pass as
"is_narrowing_conversion", rather than the one handled by
has_dst_aligned_region_restriction().  Probably we don't need to change
this function nor the regioning pass for it to be honored, because that
restriction is already implemented.  I have a feeling that the reason
for this may be that the 16-bit pipeline lacks the ability to handle
conversions from or to half-float, so the execution type is implicitly
promoted to the matching (integer or floating-point) 32-bit type where
any HF conversion would be needed.  And on those the usual alignment
restriction of the destination to a larger execution type applies.  From
the hardware docs for CHV *only*:

| When single precision and half precision floats are mixed between
| source operands or between source and destination operand. In such
| cases, single precision float is the execution datatype.

This would mean that an "add dst:f, src:hf, src:hf" is really computed
with single precision (!).

The restriction you're quoting seems to be the following:

| BDW+
|
| Conversion between Integer and HF (Half Float) must be DWord-aligned
| and strided by a DWord on the destination.
|
| // Example:
| add (8) r10.0<2>:hf r11.0<8;8,1>:w r12.0<8;8,1>:w
| // Destination stride must be 2.
| mov (8) r10.0<2>:w r11.0<8;8,1>:hf
| // Destination stride must be 2.

However that restriction is apparently overriden on *most* projects
except for BDW (where you aren't applying any restriction at all) by the
following:

| Project:  CHV, SKL+
|
| There is a relaxed alignment rule for word destinations. When the
| destination type is word (UW, W, HF), destination data types can be
| aligned to either the lowest word or the second lowest word of the
| execution channel. This means the destination data words can be either
| all in the even word locations or all in the odd word locations.
| 
| // Example:
| add (8)  r10.0<2>:hf r11.0<8;8,1>:f r12.0<8;8,1>:hf
| add (8)  r10.1<2>:hf r11.0<8;8,1>:f r12.0<8;8,1>:hf
| 
| // Note: The destination offset may be .0 or .1 although the
| // destination subregister is required to be aligned to execution
| // datatype.

Which seems somewhat nonsensical to me, because interpreted strictly it
would rule out the possibility for any packed 16-bit operations...

To make the matter even crazier, the Gen8 simulator enforces the
restrictions on BDW *only*, in fact any time that there is a conversion
between half float and any other type, which is *almost* equivalent to
the rule you'd quoted above except for more stringent requirements on
the destination sub-register alignment of FP32 -> FP16 conversions that
apply to BDW only.

Whatever of the above is going on, this patch doesn't give us what we
want.  I think we want to drop it and fix get_exec_type() to promote the
execution type correctly.  I'll reply with a patch to do that in a
minute.

> +
> +   return false;
>  }
>  
>  #endif
> -- 
> 2.17.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20190115/fa8dc096/attachment.sig>