<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Oct 30, 2017 at 1:53 PM, Jason Ekstrand <span dir="ltr"><<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div class="h5">On Mon, Oct 30, 2017 at 11:53 AM, Jason Ekstrand <span dir="ltr"><<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div class="m_3339368304667839962h5">On Mon, Oct 30, 2017 at 5:10 AM, Iago Toral <span dir="ltr"><<a href="mailto:itoral@igalia.com" target="_blank">itoral@igalia.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="m_3339368304667839962m_-5169473378615448656HOEnZb"><div class="m_3339368304667839962m_-5169473378615448656h5">On Wed, 2017-10-25 at 16:26 -0700, Jason Ekstrand wrote:<br>
> Ballot intrinsics return a bitfield of subgroups. In GLSL and some<br>
> SPIR-V extensions, they return a uint64_t. In SPV_KHR_shader_ballot,<br>
> they return a uvec4. Also, some back-ends would rather pass around<br>
> 32-bit values because it's easier than messing with 64-bit all the<br>
> time.<br>
> To solve this mess, we make nir_lower_subgroups take a new parameter<br>
> called ballot_bit_size and it lowers whichever thing it gets in from<br>
> the<br>
> source language (uint64_t or uvec4) to a scalar with the specified<br>
> number of bits. This replaces a chunk of the old lowering code.<br>
><br>
> Reviewed-by: Lionel Landwerlin <<a href="mailto:lionel.g.landwerlin@intel.com" target="_blank">lionel.g.landwerlin@intel.com</a><wbr>><br>
> ---<br>
> src/compiler/nir/nir.h <wbr> | 3 +-<br>
> src/compiler/nir/nir_lower_su<wbr>bgroups.c | 101<br>
> ++++++++++++++++++++++++++++++<wbr>+--<br>
> src/compiler/nir/nir_opt_intr<wbr>insics.c | 18 ------<br>
> src/intel/compiler/brw_compil<wbr>er.c | 1 -<br>
> src/intel/compiler/brw_nir.c <wbr> | 1 +<br>
> 5 files changed, 98 insertions(+), 26 deletions(-)<br>
><br>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h<br>
> index 1a25d7b..563b57f 100644<br>
> --- a/src/compiler/nir/nir.h<br>
> +++ b/src/compiler/nir/nir.h<br>
> @@ -1854,8 +1854,6 @@ typedef struct nir_shader_compiler_options {<br>
> */<br>
> bool use_interpolated_input_intrins<wbr>ics;<br>
> <br>
> - unsigned max_subgroup_size;<br>
> -<br>
> unsigned max_unroll_iterations;<br>
> } nir_shader_compiler_options;<br>
> <br>
> @@ -2469,6 +2467,7 @@ bool nir_lower_samplers_as_deref(ni<wbr>r_shader<br>
> *shader,<br>
> <wbr> const struct gl_shader_program<br>
> *shader_program);<br>
> <br>
> typedef struct nir_lower_subgroups_options {<br>
> + uint8_t ballot_bit_size;<br>
> bool lower_to_scalar:1;<br>
> bool lower_vote_trivial:1;<br>
> bool lower_subgroup_masks:1;<br>
> diff --git a/src/compiler/nir/nir_lower_s<wbr>ubgroups.c<br>
> b/src/compiler/nir/nir_lower_s<wbr>ubgroups.c<br>
> index 02738c4..1969740 100644<br>
> --- a/src/compiler/nir/nir_lower_s<wbr>ubgroups.c<br>
> +++ b/src/compiler/nir/nir_lower_s<wbr>ubgroups.c<br>
> @@ -28,6 +28,43 @@<br>
> * \file nir_opt_intrinsics.c<br>
> */<br>
> <br>
> +/* Converts a uint32_t or uint64_t value to uint64_t or uvec4 */<br>
> +static nir_ssa_def *<br>
> +uint_to_ballot_type(nir_build<wbr>er *b, nir_ssa_def *value,<br>
> + unsigned num_components, unsigned bit_size,<br>
> + uint32_t extend_val)<br>
> +{<br>
> + assert(value->num_componen<wbr>ts == 1);<br>
> + assert(value->bit_size == 32 || value->bit_size == 64);<br>
> +<br>
> + nir_ssa_def *extend = nir_imm_int(b, extend_val);<br>
<br>
</div></div>Is it required that we do this extension? would it be incorrect if we<br>
extended with 0's?<br><div><div class="m_3339368304667839962m_-5169473378615448656h5"></div></div></blockquote><div><br></div></div></div><div>Thanks for making me look at that. The Vulkan spec requires that they be set to 0. I guess I need to go rework some things. :)<br></div></div></div></div></blockquote><div><br></div></div></div><div>I did a bit more looking and things here are thoroughly insane. I've filed CTS and spec bugs. Let's see where they go before I rewrite anything.<br></div></div></div></div></blockquote><div><br></div><div>With the advent of the patch Neil sent today, I've come to the conclusion that the Vulkan spec is sane and consistent with GL. I've rebased subgroups on Neil's patch and sent out one new patch and new versions of 3 others. Please re-review.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div><div class="m_3339368304667839962h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="m_3339368304667839962m_-5169473378615448656h5">
> + if (num_components > 1) {<br>
> + /* SPIR-V uses a uvec4 for ballot values */<br>
> + assert(num_components == 4);<br>
> + assert(bit_size == 32);<br>
> +<br>
> + if (value->bit_size == 32) {<br>
> + return nir_vec4(b, value, extend, extend, extend);<br>
> + } else {<br>
> + assert(value->bit_si<wbr>ze == 64);<br>
> + return nir_vec4(b, nir_unpack_64_2x32_split_x(b, value),<br>
> + n<wbr>ir_unpack_64_2x32_split_y(b, value),<br>
> + e<wbr>xtend, extend);<br>
> + }<br>
> + } else {<br>
> + /* GLSL uses a uint64_t for ballot values */<br>
> + assert(num_components == 1);<br>
> + assert(bit_size == 64);<br>
> +<br>
> + if (value->bit_size == 32) {<br>
> + return nir_pack_64_2x32_split(b, value, extend);<br>
> + } else {<br>
> + assert(value->bit_si<wbr>ze == 64);<br>
> + return value;<br>
> + }<br>
> + }<br>
> +}<br>
> +<br>
> static nir_ssa_def *<br>
> lower_read_invocation_to_scal<wbr>ar(nir_builder *b, nir_intrinsic_instr<br>
> *intrin)<br>
> {<br>
> @@ -86,24 +123,78 @@ lower_subgroups_intrin(nir_bui<wbr>lder *b,<br>
> nir_intrinsic_instr *intrin,<br>
> if (!options->lower_subgroup_mask<wbr>s)<br>
> return NULL;<br>
> <br>
> + uint64_t mask;<br>
> + switch (intrin->intrinsic) {<br>
> + case nir_intrinsic_load_subgroup_eq<wbr>_mask:<br>
> + mask = 1ull;<br>
> + break;<br>
> + case nir_intrinsic_load_subgroup_ge<wbr>_mask:<br>
> + case nir_intrinsic_load_subgroup_lt<wbr>_mask:<br>
> + mask = ~0ull;<br>
> + break;<br>
> + case nir_intrinsic_load_subgroup_gt<wbr>_mask:<br>
> + case nir_intrinsic_load_subgroup_le<wbr>_mask:<br>
> + mask = ~1ull;<br>
> + break;<br>
> + default:<br>
> + unreachable("you seriously can't tell this is<br>
> unreachable?");<br>
> + }<br>
> +<br>
> nir_ssa_def *count = nir_load_subgroup_invocation(b<wbr>);<br>
> + nir_ssa_def *shifted;<br>
> + if (options->ballot_bit_size == 32 && intrin-<br>
> >dest.ssa.bit_size == 32) {<br>
<br>
</div></div>Maybe add a comment here stating that this is the SPIR-V path where we<br>
always expect uvec4 results.<br><div class="m_3339368304667839962m_-5169473378615448656HOEnZb"><div class="m_3339368304667839962m_-5169473378615448656h5"></div></div></blockquote><div><br></div></div></div><div>Sure.<br></div><span><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="m_3339368304667839962m_-5169473378615448656HOEnZb"><div class="m_3339368304667839962m_-5169473378615448656h5">
> + assert(intrin->dest.<wbr>ssa.num_components == 4);<br>
> + shifted = nir_ishl(b, nir_imm_int(b, mask), count);<br>
> + } else {<br>
> + /* We're either working with 64-bit types natively or we're<br>
> in OpenGL<br>
> + * where we want a uint64_t as our final value. In either<br>
> case we<br>
> + * know that we have 64-bit types. In the first case, we<br>
> need to use<br>
> + * 64 bits because of the native subgroup size. In the<br>
> second, we<br>
> + * want a 64-bit result and a 64-bit shift is likely more<br>
> efficient<br>
> + * than messing around with 32-bit shifts and packing.<br>
> + */<br>
> + assert(options->ball<wbr>ot_bit_size == 64 ||<br>
> + intrin->dest.<wbr>ssa.bit_size == 64);<br>
> + shifted = nir_ishl(b, nir_imm_int64(b, mask), count);<br>
> + }<br>
<br>
</div></div></blockquote></span></div><br></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div><br></div></div>