[Mesa-dev] [PATCH v3 24/42] intel/compiler: fix ddy for half-float in gen8
Iago Toral
itoral at igalia.com
Wed Jan 23 08:37:50 UTC 2019
On Tue, 2019-01-22 at 16:36 -0800, Matt Turner wrote:
> On Tue, Jan 15, 2019 at 5:54 AM Iago Toral Quiroga <itoral at igalia.com
> > wrote:
> >
> > We use ALign16 mode for this, since it is more convenient, but the
> > PRM
> > for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register
> > region
> > restrictions', Section '1. Special Restrictions':
> >
> > "In Align16 mode, the channel selects and channel enables apply
> > to a
> > pair of half-floats, because these parameters are defined for
> > DWord
> > elements ONLY. This is applicable when both source and
> > destination
> > are half-floats."
> >
> > This means that we cannot select individual HF elements using
> > swizzles
> > like we do with 32-bit floats so we can't implement the required
> > regioning for this.
> >
> > Use the gen11 path for this instead, which uses Align1 mode.
> >
> > The restriction is not present in gen9 or gen10, where the Align16
> > implementation seems to work just fine.
> >
> > Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
> > ---
> > src/intel/compiler/brw_fs_generator.cpp | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/intel/compiler/brw_fs_generator.cpp
> > b/src/intel/compiler/brw_fs_generator.cpp
> > index d0cc4a6d231..4310f0b7fdc 100644
> > --- a/src/intel/compiler/brw_fs_generator.cpp
> > +++ b/src/intel/compiler/brw_fs_generator.cpp
> > @@ -1339,8 +1339,14 @@ fs_generator::generate_ddy(const fs_inst
> > *inst,
> > const uint32_t type_size = type_sz(src.type);
> >
> > if (inst->opcode == FS_OPCODE_DDY_FINE) {
> > - /* produce accurate derivatives */
> > - if (devinfo->gen >= 11) {
> > + /* produce accurate derivatives. We can do this easily in
> > Align16
> > + * but this is not supported in gen11+ and gen8 Align16
> > swizzles
> > + * for Half-Float operands work in units of 32-bit and
> > always
> > + * select pairs of consecutive half-float elements, so we
> > can't use
> > + * use it for this.
> > + */
>
> Let's break this comment up and include (or move) the BSpec text from
> the commit message here. I wouldn't mention the "this is not
> supported
> in gen11+" because it's slightly unclear whether you're talking about
> "accurate derivatives" or "Align16". How about:
>
>
> /* produce accurate derivatives.
> *
> * From the Broadwell PRM, Volume 7 (3D-Media-GPGPU)
> * "Register Region Restrictions", Section "1. Special
> Restrictions":
> *
> * "In Align16 mode, the channel selects and channel enables
> apply to
> * a pair of half-floats, because these parameters are
> defined for
> * DWord elements ONLY. This is applicable when both source
> and
> * destination are half-floats."
> *
> * So for half-float operations we use the Gen11+ Align1 path.
> CHV
> * inherits its FP16 hardware from SKL, so it is not affected.
> */
>
>
> > + if (devinfo->gen >= 11 ||
> > + (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF))
> > {
>
>
> The docs are bad about telling us whether a BDW restriction applies
> to
> CHV as well, but in this case I suspect the answer is no. CHV seems
> to
> inherit its FP16 hw from SKL, which doesn't have the restriction as
> you say.
>
> So I suspect you want devinfo->is_broadwell instead of devinfo->gen
> == 8.
I have just confirmed that your suspicion is correct, thanks for
bringing this up!
> With that (and confirmation that CHV isn't affected), this patch is
>
> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>
More information about the mesa-dev
mailing list