[Mesa-dev] [PATCH v3 43/48] nir/lower_subgroups: Lower ballot intrinsics to the specified bit size

Tue Oct 31 23:57:12 UTC 2017

On Mon, Oct 30, 2017 at 1:53 PM, Jason Ekstrand <jason at jlekstrand.net>
wrote:

> On Mon, Oct 30, 2017 at 11:53 AM, Jason Ekstrand <jason at jlekstrand.net>
> wrote:
>
>> On Mon, Oct 30, 2017 at 5:10 AM, Iago Toral <itoral at igalia.com> wrote:
>>
>>> On Wed, 2017-10-25 at 16:26 -0700, Jason Ekstrand wrote:
>>> > Ballot intrinsics return a bitfield of subgroups.  In GLSL and some
>>> > SPIR-V extensions, they return a uint64_t.  In SPV_KHR_shader_ballot,
>>> > they return a uvec4.  Also, some back-ends would rather pass around
>>> > 32-bit values because it's easier than messing with 64-bit all the
>>> > time.
>>> > To solve this mess, we make nir_lower_subgroups take a new parameter
>>> > called ballot_bit_size and it lowers whichever thing it gets in from
>>> > the
>>> > source language (uint64_t or uvec4) to a scalar with the specified
>>> > number of bits.  This replaces a chunk of the old lowering code.
>>> >
>>> > Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>>> > ---
>>> >  src/compiler/nir/nir.h                 |   3 +-
>>> >  src/compiler/nir/nir_lower_subgroups.c | 101
>>> > +++++++++++++++++++++++++++++++--
>>> >  src/compiler/nir/nir_opt_intrinsics.c  |  18 ------
>>> >  src/intel/compiler/brw_compiler.c      |   1 -
>>> >  src/intel/compiler/brw_nir.c           |   1 +
>>> >  5 files changed, 98 insertions(+), 26 deletions(-)
>>> >
>>> > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>>> > index 1a25d7b..563b57f 100644
>>> > --- a/src/compiler/nir/nir.h
>>> > +++ b/src/compiler/nir/nir.h
>>> > @@ -1854,8 +1854,6 @@ typedef struct nir_shader_compiler_options {
>>> >      */
>>> >     bool use_interpolated_input_intrinsics;
>>> >
>>> > -   unsigned max_subgroup_size;
>>> > -
>>> >     unsigned max_unroll_iterations;
>>> >  } nir_shader_compiler_options;
>>> >
>>> > @@ -2469,6 +2467,7 @@ bool nir_lower_samplers_as_deref(nir_shader
>>> > *shader,
>>> >                                   const struct gl_shader_program
>>> > *shader_program);
>>> >
>>> >  typedef struct nir_lower_subgroups_options {
>>> > +   uint8_t ballot_bit_size;
>>> >     bool lower_to_scalar:1;
>>> >     bool lower_vote_trivial:1;
>>> >     bool lower_subgroup_masks:1;
>>> > diff --git a/src/compiler/nir/nir_lower_subgroups.c
>>> > b/src/compiler/nir/nir_lower_subgroups.c
>>> > index 02738c4..1969740 100644
>>> > --- a/src/compiler/nir/nir_lower_subgroups.c
>>> > +++ b/src/compiler/nir/nir_lower_subgroups.c
>>> > @@ -28,6 +28,43 @@
>>> >   * \file nir_opt_intrinsics.c
>>> >   */
>>> >
>>> > +/* Converts a uint32_t or uint64_t value to uint64_t or uvec4 */
>>> > +static nir_ssa_def *
>>> > +uint_to_ballot_type(nir_builder *b, nir_ssa_def *value,
>>> > +                    unsigned num_components, unsigned bit_size,
>>> > +                    uint32_t extend_val)
>>> > +{
>>> > +   assert(value->num_components == 1);
>>> > +   assert(value->bit_size == 32 || value->bit_size == 64);
>>> > +
>>> > +   nir_ssa_def *extend = nir_imm_int(b, extend_val);
>>>
>>> Is it required that we do this extension? would it be incorrect if we
>>> extended with 0's?
>>>
>>
>> Thanks for making me look at that.  The Vulkan spec requires that they be
>> set to 0.  I guess I need to go rework some things. :)
>>
>
> I did a bit more looking and things here are thoroughly insane.  I've
> filed CTS and spec bugs.  Let's see where they go before I rewrite anything.
>

With the advent of the patch Neil sent today, I've come to the conclusion
that the Vulkan spec is sane and consistent with GL.  I've rebased
subgroups on Neil's patch and sent out one new patch and new versions of 3
others.  Please re-review.

> > +   if (num_components > 1) {
>>> > +      /* SPIR-V uses a uvec4 for ballot values */
>>> > +      assert(num_components == 4);
>>> > +      assert(bit_size == 32);
>>> > +
>>> > +      if (value->bit_size == 32) {
>>> > +         return nir_vec4(b, value, extend, extend, extend);
>>> > +      } else {
>>> > +         assert(value->bit_size == 64);
>>> > +         return nir_vec4(b, nir_unpack_64_2x32_split_x(b, value),
>>> > +                            nir_unpack_64_2x32_split_y(b, value),
>>> > +                            extend, extend);
>>> > +      }
>>> > +   } else {
>>> > +      /* GLSL uses a uint64_t for ballot values */
>>> > +      assert(num_components == 1);
>>> > +      assert(bit_size == 64);
>>> > +
>>> > +      if (value->bit_size == 32) {
>>> > +         return nir_pack_64_2x32_split(b, value, extend);
>>> > +      } else {
>>> > +         assert(value->bit_size == 64);
>>> > +         return value;
>>> > +      }
>>> > +   }
>>> > +}
>>> > +
>>> >  static nir_ssa_def *
>>> >  lower_read_invocation_to_scalar(nir_builder *b, nir_intrinsic_instr
>>> > *intrin)
>>> >  {
>>> > @@ -86,24 +123,78 @@ lower_subgroups_intrin(nir_builder *b,
>>> > nir_intrinsic_instr *intrin,
>>> >        if (!options->lower_subgroup_masks)
>>> >           return NULL;
>>> >
>>> > +      uint64_t mask;
>>> > +      switch (intrin->intrinsic) {
>>> > +      case nir_intrinsic_load_subgroup_eq_mask:
>>> > +         mask = 1ull;
>>> > +         break;
>>> > +      case nir_intrinsic_load_subgroup_ge_mask:
>>> > +      case nir_intrinsic_load_subgroup_lt_mask:
>>> > +         mask = ~0ull;
>>> > +         break;
>>> > +      case nir_intrinsic_load_subgroup_gt_mask:
>>> > +      case nir_intrinsic_load_subgroup_le_mask:
>>> > +         mask = ~1ull;
>>> > +         break;
>>> > +      default:
>>> > +         unreachable("you seriously can't tell this is
>>> > unreachable?");
>>> > +      }
>>> > +
>>> >        nir_ssa_def *count = nir_load_subgroup_invocation(b);
>>> > +      nir_ssa_def *shifted;
>>> > +      if (options->ballot_bit_size == 32 && intrin-
>>> > >dest.ssa.bit_size == 32) {
>>>
>>> Maybe add a comment here stating that this is the SPIR-V path where we
>>> always expect uvec4 results.
>>>
>>
>> Sure.
>>
>>
>>> > +         assert(intrin->dest.ssa.num_components == 4);
>>> > +         shifted = nir_ishl(b, nir_imm_int(b, mask), count);
>>> > +      } else {
>>> > +         /* We're either working with 64-bit types natively or we're
>>> > in OpenGL
>>> > +          * where we want a uint64_t as our final value.  In either
>>> > case we
>>> > +          * know that we have 64-bit types.  In the first case, we
>>> > need to use
>>> > +          * 64 bits because of the native subgroup size.  In the
>>> > second, we
>>> > +          * want a 64-bit result and a 64-bit shift is likely more
>>> > efficient
>>> > +          * than messing around with 32-bit shifts and packing.
>>> > +          */
>>> > +         assert(options->ballot_bit_size == 64 ||
>>> > +                intrin->dest.ssa.bit_size == 64);
>>> > +         shifted = nir_ishl(b, nir_imm_int64(b, mask), count);
>>> > +      }
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20171031/dd9f6f1e/attachment-0001.html>