[Mesa-dev] [PATCH 4/4] nir: add ARB_shader_ballot and ARB_shader_group_vote instructions

Jason Ekstrand jason at jlekstrand.net
Tue Jun 6 20:45:37 UTC 2017


On Mon, Jun 5, 2017 at 9:52 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:

> On Mon, Jun 5, 2017 at 6:37 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>
>> I pushed a v2 at
>> https://cgit.freedesktop.org/~cwabbott0/mesa/log/?h=nir-divergence-v2.
>> I'm not sure if I like this version better, though. I'll have to think
>> about it. In the meantime, feel free to take a look.
>>
>
> I've taken a skim through the branch and I agree that I'm not sure
> either.  Here's a few thoughts in no particular order:
>
>  1) Other than the fact that it's a pile of churn, it doesn't seem to make
> too much difference whether dFdx and dFdy are ALU or intrinsics
>
>  2) Convergent instructions are, in a lot of ways, easier to deal with
> than plain cross-thread ones.  Convergent ops can always be moved up the
> dominance tree or down into uniform control-flow.  Regular cross-thread
> instructions can't be moved across any non-uniform control-flow.
>
>  3) dFdx and dFdy are weird because they're convergent so it's clear they
> are special but not clear they should be intrinsics instead of ALU
>
>  4) I like the nir_instr_is_convergent() and nir_instr_is_cross_thread()
> helpers
>
>  5) non-convergent cross-thread instructions should definitely be
> intrinsics.
>
>  6) I think the shader ballot stuff is all non-convergent cross-thread as
> are some of the more advanced subgroup operations (see HLSL shader model
> 6.0).
>

Having slept on things a bit, I think I've come to the conclusion that
leaving dFdx and dFdy as-is should be fine so long as we have the
nir_instr_is_convergent() and _is_cross_thread() helpers.  We need to do
special casing in those for texture instructions anyway so adding in a
quick switch for ALU derivatives isn't bad.  For shader_ballot type
instructions, I think they're probably best done as intrinsics for now.
That way the compiler will leave them alone most of the time and only
things that actually know what they're doing will ever try to optimize them.

--Jason


> That's all for now,
>
> --Jason
>
>
>> On Mon, Jun 5, 2017 at 2:43 PM, Jason Ekstrand <jason at jlekstrand.net>
>> wrote:
>> > On Mon, Jun 5, 2017 at 1:50 PM, Connor Abbott <cwabbott0 at gmail.com>
>> wrote:
>> >>
>> >> On Mon, Jun 5, 2017 at 1:37 PM, Jason Ekstrand <jason at jlekstrand.net>
>> >> wrote:
>> >> > I'm not sure how I feel about having these as ALU operations.  ALU
>> >> > operations are generally pure functions (with the exception
>> derivative)
>> >> > that
>> >> > can be re-ordered at will.  I don't really like breaking that.  In
>> fact,
>> >> > I'd
>> >> > almost be inclined to make derivatives intrinsics and just
>> special-case
>> >> > them
>> >> > in constant folding.  Thoughts?
>> >>
>> >> I wasn't too sure about this either. It is a little weird to make
>> >> these ALU instructions. I followed the rule here that if something can
>> >> be constant-folded, it should be an ALU instruction, but I guess you
>> >> can argue that it's just a coincidence that these can be
>> >> constant-folded anyways.
>> >
>> >
>> > Yeah.  As subgroup ops get more complicated, I think a log of the
>> subgroup
>> > operations can be constant-folded after a fashion but the rules get
>> weird
>> > fast.
>> >
>> >>
>> >> I guess the main downside is that it would be
>> >> impossible to make nir_algebraic patterns with these, although I can't
>> >> think of too many simple pattern-matching type things you'd want to do
>> >> on these instructions anyways.
>> >
>> >
>> > Yeah.  My gut also tells me that shaders which are "advanced" enough to
>> use
>> > subgroup features probably don't need (or it can't be done) the massive
>> > reductions we do for D3D9-generated shaders.
>> >
>> >>
>> >> Maybe something like not(any(not(foo)))
>> >> -> all(foo) and vice-versa?
>> >>
>> >> >
>> >> > On Mon, Jun 5, 2017 at 12:22 PM, Connor Abbott <cwabbott0 at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Signed-off-by: Connor Abbott <cwabbott0 at gmail.com>
>> >> >> ---
>> >> >>  src/compiler/nir/nir_intrinsics.h | 14 ++++++++++++++
>> >> >>  src/compiler/nir/nir_opcodes.py   | 18 ++++++++++++++++--
>> >> >>  2 files changed, 30 insertions(+), 2 deletions(-)
>> >> >>
>> >> >> diff --git a/src/compiler/nir/nir_intrinsics.h
>> >> >> b/src/compiler/nir/nir_intrinsics.h
>> >> >> index 21e7d90..157df7f 100644
>> >> >> --- a/src/compiler/nir/nir_intrinsics.h
>> >> >> +++ b/src/compiler/nir/nir_intrinsics.h
>> >> >> @@ -330,6 +330,20 @@ SYSTEM_VALUE(channel_num, 1, 0, xx, xx, xx)
>> >> >>  SYSTEM_VALUE(alpha_ref_float, 1, 0, xx, xx, xx)
>> >> >>  SYSTEM_VALUE(layer_id, 1, 0, xx, xx, xx)
>> >> >>  SYSTEM_VALUE(view_index, 1, 0, xx, xx, xx)
>> >> >> +SYSTEM_VALUE(subgroup_invocation, 1, 0, xx, xx, xx)
>> >> >> +
>> >> >> +
>> >> >> +/* ARB_shader_ballot instructions */
>> >> >> +
>> >> >> +SYSTEM_VALUE(subgroup_eq_mask, 1, 0, xx, xx, xx)
>> >> >> +SYSTEM_VALUE(subgroup_ge_mask, 1, 0, xx, xx, xx)
>> >> >> +SYSTEM_VALUE(subgroup_gt_mask, 1, 0, xx, xx, xx)
>> >> >> +SYSTEM_VALUE(subgroup_le_mask, 1, 0, xx, xx, xx)
>> >> >> +SYSTEM_VALUE(subgroup_lt_mask, 1, 0, xx, xx, xx)
>> >> >> +
>> >> >> +INTRINSIC(ballot, 1, ARR(0), true, 0, 0, 0, xx, xx, xx,
>> >> >> +          NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER |
>> >> >> +          NIR_INTRINSIC_CROSS_THREAD)
>> >> >>
>> >> >>  /* Blend constant color values.  Float values are clamped. */
>> >> >>  SYSTEM_VALUE(blend_const_color_r_float, 1, 0, xx, xx, xx)
>> >> >> diff --git a/src/compiler/nir/nir_opcodes.py
>> >> >> b/src/compiler/nir/nir_opcodes.py
>> >> >> index be3ab6d..05a80b2 100644
>> >> >> --- a/src/compiler/nir/nir_opcodes.py
>> >> >> +++ b/src/compiler/nir/nir_opcodes.py
>> >> >> @@ -120,8 +120,10 @@ def opcode(name, output_size, output_type,
>> >> >> input_sizes, input_types,
>> >> >>                            input_types, convergent, cross_thread,
>> >> >>                            algebraic_properties, const_expr)
>> >> >>
>> >> >> -def unop_convert(name, out_type, in_type, const_expr):
>> >> >> -   opcode(name, 0, out_type, [0], [in_type], "", const_expr)
>> >> >> +def unop_convert(name, out_type, in_type, const_expr,
>> >> >> cross_thread=False,
>> >> >> +                 convergent=False):
>> >> >> +   opcode(name, 0, out_type, [0], [in_type], "", const_expr,
>> >> >> convergent,
>> >> >> +          cross_thread)
>> >> >>
>> >> >>  def unop(name, ty, const_expr, convergent=False,
>> cross_thread=False):
>> >> >>     opcode(name, 0, ty, [0], [ty], "", const_expr, convergent,
>> >> >> cross_thread)
>> >> >> @@ -355,6 +357,18 @@ for i in xrange(1, 5):
>> >> >>     for j in xrange(1, 5):
>> >> >>        unop_horiz("fnoise{0}_{1}".format(i, j), i, tfloat, j,
>> tfloat,
>> >> >> "0.0f")
>> >> >>
>> >> >> +# ARB_shader_ballot instructions
>> >> >> +
>> >> >> +opcode("read_invocation", 0, tuint, [0, 1], [tuint, tuint32], "",
>> >> >> "src0",
>> >> >> +        cross_thread=True)
>> >> >> +unop("read_first_invocation", tuint, "src0", cross_thread=True)
>> >> >> +
>> >> >> +# ARB_shader_group_vote instructions
>> >> >> +
>> >> >> +unop("any_invocations", tbool, "src0", cross_thread=True)
>> >> >> +unop("all_invocations", tbool, "src0", cross_thread=True)
>> >> >> +unop("all_invocations_equal", tbool, "true", cross_thread=True)
>> >> >> +
>> >> >>  def binop_convert(name, out_type, in_type, alg_props, const_expr):
>> >> >>     opcode(name, 0, out_type, [0, 0], [in_type, in_type], alg_props,
>> >> >> const_expr)
>> >> >>
>> >> >> --
>> >> >> 2.9.3
>> >> >>
>> >> >> _______________________________________________
>> >> >> mesa-dev mailing list
>> >> >> mesa-dev at lists.freedesktop.org
>> >> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> >> >
>> >> >
>> >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170606/9a0f6897/attachment-0001.html>


More information about the mesa-dev mailing list