[Mesa-dev] [PATCH v3 24/42] intel/compiler: fix ddy for half-float in gen8

Wed Jan 23 00:36:53 UTC 2019

On Tue, Jan 15, 2019 at 5:54 AM Iago Toral Quiroga <itoral at igalia.com> wrote:
>
> We use ALign16 mode for this, since it is more convenient, but the PRM
> for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region
> restrictions', Section '1. Special Restrictions':
>
>    "In Align16 mode, the channel selects and channel enables apply to a
>     pair of half-floats, because these parameters are defined for DWord
>     elements ONLY. This is applicable when both source and destination
>     are half-floats."
>
> This means that we cannot select individual HF elements using swizzles
> like we do with 32-bit floats so we can't implement the required
> regioning for this.
>
> Use the gen11 path for this instead, which uses Align1 mode.
>
> The restriction is not present in gen9 or gen10, where the Align16
> implementation seems to work just fine.
>
> Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
> ---
>  src/intel/compiler/brw_fs_generator.cpp | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_generator.cpp b/src/intel/compiler/brw_fs_generator.cpp
> index d0cc4a6d231..4310f0b7fdc 100644
> --- a/src/intel/compiler/brw_fs_generator.cpp
> +++ b/src/intel/compiler/brw_fs_generator.cpp
> @@ -1339,8 +1339,14 @@ fs_generator::generate_ddy(const fs_inst *inst,
>     const uint32_t type_size = type_sz(src.type);
>
>     if (inst->opcode == FS_OPCODE_DDY_FINE) {
> -      /* produce accurate derivatives */
> -      if (devinfo->gen >= 11) {
> +      /* produce accurate derivatives. We can do this easily in Align16
> +       * but this is not supported in gen11+ and gen8 Align16 swizzles
> +       * for Half-Float operands work in units of 32-bit and always
> +       * select pairs of consecutive half-float elements, so we can't use
> +       * use it for this.
> +       */

Let's break this comment up and include (or move) the BSpec text from
the commit message here. I wouldn't mention the "this is not supported
in gen11+" because it's slightly unclear whether you're talking about
"accurate derivatives" or "Align16". How about:

      /* produce accurate derivatives.
       *
       * From the Broadwell PRM, Volume 7 (3D-Media-GPGPU)
       * "Register Region Restrictions", Section "1. Special Restrictions":
       *
       *    "In Align16 mode, the channel selects and channel enables apply to
       *     a pair of half-floats, because these parameters are defined for
       *     DWord elements ONLY. This is applicable when both source and
       *     destination are half-floats."
       *
       * So for half-float operations we use the Gen11+ Align1 path. CHV
       * inherits its FP16 hardware from SKL, so it is not affected.
       */

> +      if (devinfo->gen >= 11 ||
> +          (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) {

The docs are bad about telling us whether a BDW restriction applies to
CHV as well, but in this case I suspect the answer is no. CHV seems to
inherit its FP16 hw from SKL, which doesn't have the restriction as
you say.

So I suspect you want devinfo->is_broadwell instead of devinfo->gen == 8.

With that (and confirmation that CHV isn't affected), this patch is

Reviewed-by: Matt Turner <mattst88 at gmail.com>