[Mesa-dev] [PATCH v3 24/42] intel/compiler: fix ddy for half-float in gen8

Wed Jan 23 08:37:50 UTC 2019

On Tue, 2019-01-22 at 16:36 -0800, Matt Turner wrote:
> On Tue, Jan 15, 2019 at 5:54 AM Iago Toral Quiroga <itoral at igalia.com
> > wrote:
> > 
> > We use ALign16 mode for this, since it is more convenient, but the
> > PRM
> > for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register
> > region
> > restrictions', Section '1. Special Restrictions':
> > 
> >    "In Align16 mode, the channel selects and channel enables apply
> > to a
> >     pair of half-floats, because these parameters are defined for
> > DWord
> >     elements ONLY. This is applicable when both source and
> > destination
> >     are half-floats."
> > 
> > This means that we cannot select individual HF elements using
> > swizzles
> > like we do with 32-bit floats so we can't implement the required
> > regioning for this.
> > 
> > Use the gen11 path for this instead, which uses Align1 mode.
> > 
> > The restriction is not present in gen9 or gen10, where the Align16
> > implementation seems to work just fine.
> > 
> > Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
> > ---
> >  src/intel/compiler/brw_fs_generator.cpp | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/intel/compiler/brw_fs_generator.cpp
> > b/src/intel/compiler/brw_fs_generator.cpp
> > index d0cc4a6d231..4310f0b7fdc 100644
> > --- a/src/intel/compiler/brw_fs_generator.cpp
> > +++ b/src/intel/compiler/brw_fs_generator.cpp
> > @@ -1339,8 +1339,14 @@ fs_generator::generate_ddy(const fs_inst
> > *inst,
> >     const uint32_t type_size = type_sz(src.type);
> > 
> >     if (inst->opcode == FS_OPCODE_DDY_FINE) {
> > -      /* produce accurate derivatives */
> > -      if (devinfo->gen >= 11) {
> > +      /* produce accurate derivatives. We can do this easily in
> > Align16
> > +       * but this is not supported in gen11+ and gen8 Align16
> > swizzles
> > +       * for Half-Float operands work in units of 32-bit and
> > always
> > +       * select pairs of consecutive half-float elements, so we
> > can't use
> > +       * use it for this.
> > +       */
> 
> Let's break this comment up and include (or move) the BSpec text from
> the commit message here. I wouldn't mention the "this is not
> supported
> in gen11+" because it's slightly unclear whether you're talking about
> "accurate derivatives" or "Align16". How about:
> 
> 
>       /* produce accurate derivatives.
>        *
>        * From the Broadwell PRM, Volume 7 (3D-Media-GPGPU)
>        * "Register Region Restrictions", Section "1. Special
> Restrictions":
>        *
>        *    "In Align16 mode, the channel selects and channel enables
> apply to
>        *     a pair of half-floats, because these parameters are
> defined for
>        *     DWord elements ONLY. This is applicable when both source
> and
>        *     destination are half-floats."
>        *
>        * So for half-float operations we use the Gen11+ Align1 path.
> CHV
>        * inherits its FP16 hardware from SKL, so it is not affected.
>        */
> 
> 
> > +      if (devinfo->gen >= 11 ||
> > +          (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF))
> > {
> 
> 
> The docs are bad about telling us whether a BDW restriction applies
> to
> CHV as well, but in this case I suspect the answer is no. CHV seems
> to
> inherit its FP16 hw from SKL, which doesn't have the restriction as
> you say.
> 
> So I suspect you want devinfo->is_broadwell instead of devinfo->gen
> == 8.

I have just confirmed that your suspicion is correct, thanks for
bringing this up!

> With that (and confirmation that CHV isn't affected), this patch is
> 
> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>