[Mesa-dev] [PATCH v3 24/42] intel/compiler: fix ddy for half-float in gen8
Matt Turner
mattst88 at gmail.com
Wed Jan 23 00:36:53 UTC 2019
On Tue, Jan 15, 2019 at 5:54 AM Iago Toral Quiroga <itoral at igalia.com> wrote:
>
> We use ALign16 mode for this, since it is more convenient, but the PRM
> for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region
> restrictions', Section '1. Special Restrictions':
>
> "In Align16 mode, the channel selects and channel enables apply to a
> pair of half-floats, because these parameters are defined for DWord
> elements ONLY. This is applicable when both source and destination
> are half-floats."
>
> This means that we cannot select individual HF elements using swizzles
> like we do with 32-bit floats so we can't implement the required
> regioning for this.
>
> Use the gen11 path for this instead, which uses Align1 mode.
>
> The restriction is not present in gen9 or gen10, where the Align16
> implementation seems to work just fine.
>
> Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
> ---
> src/intel/compiler/brw_fs_generator.cpp | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_generator.cpp b/src/intel/compiler/brw_fs_generator.cpp
> index d0cc4a6d231..4310f0b7fdc 100644
> --- a/src/intel/compiler/brw_fs_generator.cpp
> +++ b/src/intel/compiler/brw_fs_generator.cpp
> @@ -1339,8 +1339,14 @@ fs_generator::generate_ddy(const fs_inst *inst,
> const uint32_t type_size = type_sz(src.type);
>
> if (inst->opcode == FS_OPCODE_DDY_FINE) {
> - /* produce accurate derivatives */
> - if (devinfo->gen >= 11) {
> + /* produce accurate derivatives. We can do this easily in Align16
> + * but this is not supported in gen11+ and gen8 Align16 swizzles
> + * for Half-Float operands work in units of 32-bit and always
> + * select pairs of consecutive half-float elements, so we can't use
> + * use it for this.
> + */
Let's break this comment up and include (or move) the BSpec text from
the commit message here. I wouldn't mention the "this is not supported
in gen11+" because it's slightly unclear whether you're talking about
"accurate derivatives" or "Align16". How about:
/* produce accurate derivatives.
*
* From the Broadwell PRM, Volume 7 (3D-Media-GPGPU)
* "Register Region Restrictions", Section "1. Special Restrictions":
*
* "In Align16 mode, the channel selects and channel enables apply to
* a pair of half-floats, because these parameters are defined for
* DWord elements ONLY. This is applicable when both source and
* destination are half-floats."
*
* So for half-float operations we use the Gen11+ Align1 path. CHV
* inherits its FP16 hardware from SKL, so it is not affected.
*/
> + if (devinfo->gen >= 11 ||
> + (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) {
The docs are bad about telling us whether a BDW restriction applies to
CHV as well, but in this case I suspect the answer is no. CHV seems to
inherit its FP16 hw from SKL, which doesn't have the restriction as
you say.
So I suspect you want devinfo->is_broadwell instead of devinfo->gen == 8.
With that (and confirmation that CHV isn't affected), this patch is
Reviewed-by: Matt Turner <mattst88 at gmail.com>
More information about the mesa-dev
mailing list