[Mesa-dev] [PATCH 16/20] nir: Reduce destination size of ballot intrinsic when possible

Tue Jul 18 00:35:26 UTC 2017

On Monday, July 10, 2017 10:22:23 AM PDT Matt Turner wrote:
> Some hardware, like i965, doesn't support group sizes greater than 32.
> In that case, we can reduce the destination size of the ballot
> intrinsic, which will simplify our code generation.
> ---
> v2: Just change the intrinsic size, and don't add a new intrinsic (Connor)
> 
>  src/compiler/nir/nir.h                |  2 ++
>  src/compiler/nir/nir_opt_intrinsics.c | 18 ++++++++++++++++++
>  src/intel/compiler/brw_compiler.c     |  1 +
>  3 files changed, 21 insertions(+)
> 
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 1e2d7d3cf6..5518807b0b 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -1842,6 +1842,8 @@ typedef struct nir_shader_compiler_options {
>      */
>     bool use_interpolated_input_intrinsics;
>  
> +   unsigned max_subgroup_size;
> +
>     unsigned max_unroll_iterations;
>  } nir_shader_compiler_options;
>  
> diff --git a/src/compiler/nir/nir_opt_intrinsics.c b/src/compiler/nir/nir_opt_intrinsics.c
> index 0358680aae..d30c1cf6bb 100644
> --- a/src/compiler/nir/nir_opt_intrinsics.c
> +++ b/src/compiler/nir/nir_opt_intrinsics.c
> @@ -62,6 +62,24 @@ opt_intrinsics_impl(nir_function_impl *impl)
>              replacement = nir_imm_int(&b, NIR_TRUE);
>              break;
>           }
> +         case nir_intrinsic_ballot: {
> +            assert(b.shader->options->max_subgroup_size != 0);
> +            if (b.shader->options->max_subgroup_size > 32 ||
> +                intrin->dest.ssa.bit_size <= 32)
> +               continue;
> +
> +            nir_intrinsic_instr *ballot =
> +               nir_intrinsic_instr_create(b.shader, nir_intrinsic_ballot);
> +            nir_ssa_dest_init(&ballot->instr, &ballot->dest, 1, 32, NULL);
> +            ballot->src[0] = intrin->src[0];

nir_src_copy

> +
> +            nir_builder_instr_insert(&b, &ballot->instr);
> +
> +            replacement = nir_pack_64_2x32_split(&b,
> +                                                 &ballot->dest.ssa,
> +                                                 nir_imm_int(&b, 0));
> +            break;
> +         }
>           default:
>              break;
>           }
> diff --git a/src/intel/compiler/brw_compiler.c b/src/intel/compiler/brw_compiler.c
> index 397c8cccf9..b910fcbc3d 100644
> --- a/src/intel/compiler/brw_compiler.c
> +++ b/src/intel/compiler/brw_compiler.c
> @@ -57,6 +57,7 @@ static const struct nir_shader_compiler_options scalar_nir_options = {
>     .lower_unpack_snorm_4x8 = true,
>     .lower_unpack_unorm_2x16 = true,
>     .lower_unpack_unorm_4x8 = true,
> +   .max_subgroup_size = 64, /* FIXME */

Okay...because you haven't updated the i965/fs backend to actually pay
attention to the bitsize and select UD or UQ (until patch 18), you set
it to 64 here to disable lowering...and then enable it later.

That works.  The way I would have expected the series to look is:

1. In patch 14, have brw_fs_nir.cpp select UQ or UD based on bitsize.
   (The bitsize will always be 64, so you'll always do UQ.)

2. In patch 16v3, add this lowering and set max_subgroup_size to 32,
   so we add the new pass and immediately use it.

Essentially, squash one hunk of 18 into 14, and the other into 16.

What you've done isn't wrong, and looks good in the end, so either way,
patches 16v2 and 18v2 are:

Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

(with the nir_src_copy thing fixed)

>     .max_unroll_iterations = 32,
>  };
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170717/76674a3b/attachment.sig>