[Mesa-dev] [PATCH 4/4] nir: add ARB_shader_ballot and ARB_shader_group_vote instructions

Tue Jun 6 20:48:47 UTC 2017

On Tue, Jun 6, 2017 at 1:45 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>
>
> On Mon, Jun 5, 2017 at 9:52 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>
>> On Mon, Jun 5, 2017 at 6:37 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>>
>>> I pushed a v2 at
>>> https://cgit.freedesktop.org/~cwabbott0/mesa/log/?h=nir-divergence-v2.
>>> I'm not sure if I like this version better, though. I'll have to think
>>> about it. In the meantime, feel free to take a look.
>>
>>
>> I've taken a skim through the branch and I agree that I'm not sure either.
>> Here's a few thoughts in no particular order:
>>
>>  1) Other than the fact that it's a pile of churn, it doesn't seem to make
>> too much difference whether dFdx and dFdy are ALU or intrinsics
>>
>>  2) Convergent instructions are, in a lot of ways, easier to deal with
>> than plain cross-thread ones.  Convergent ops can always be moved up the
>> dominance tree or down into uniform control-flow.  Regular cross-thread
>> instructions can't be moved across any non-uniform control-flow.
>>
>>  3) dFdx and dFdy are weird because they're convergent so it's clear they
>> are special but not clear they should be intrinsics instead of ALU
>>
>>  4) I like the nir_instr_is_convergent() and nir_instr_is_cross_thread()
>> helpers
>>
>>  5) non-convergent cross-thread instructions should definitely be
>> intrinsics.
>>
>>  6) I think the shader ballot stuff is all non-convergent cross-thread as
>> are some of the more advanced subgroup operations (see HLSL shader model
>> 6.0).
>
>
> Having slept on things a bit, I think I've come to the conclusion that
> leaving dFdx and dFdy as-is should be fine so long as we have the
> nir_instr_is_convergent() and _is_cross_thread() helpers.  We need to do
> special casing in those for texture instructions anyway so adding in a quick
> switch for ALU derivatives isn't bad.  For shader_ballot type instructions,
> I think they're probably best done as intrinsics for now.  That way the
> compiler will leave them alone most of the time and only things that
> actually know what they're doing will ever try to optimize them.
>
> --Jason

Ok, that sounds good.

>
>>
>> That's all for now,
>>
>> --Jason
>>
>>>
>>> On Mon, Jun 5, 2017 at 2:43 PM, Jason Ekstrand <jason at jlekstrand.net>
>>> wrote:
>>> > On Mon, Jun 5, 2017 at 1:50 PM, Connor Abbott <cwabbott0 at gmail.com>
>>> > wrote:
>>> >>
>>> >> On Mon, Jun 5, 2017 at 1:37 PM, Jason Ekstrand <jason at jlekstrand.net>
>>> >> wrote:
>>> >> > I'm not sure how I feel about having these as ALU operations.  ALU
>>> >> > operations are generally pure functions (with the exception
>>> >> > derivative)
>>> >> > that
>>> >> > can be re-ordered at will.  I don't really like breaking that.  In
>>> >> > fact,
>>> >> > I'd
>>> >> > almost be inclined to make derivatives intrinsics and just
>>> >> > special-case
>>> >> > them
>>> >> > in constant folding.  Thoughts?
>>> >>
>>> >> I wasn't too sure about this either. It is a little weird to make
>>> >> these ALU instructions. I followed the rule here that if something can
>>> >> be constant-folded, it should be an ALU instruction, but I guess you
>>> >> can argue that it's just a coincidence that these can be
>>> >> constant-folded anyways.
>>> >
>>> >
>>> > Yeah.  As subgroup ops get more complicated, I think a log of the
>>> > subgroup
>>> > operations can be constant-folded after a fashion but the rules get
>>> > weird
>>> > fast.
>>> >
>>> >>
>>> >> I guess the main downside is that it would be
>>> >> impossible to make nir_algebraic patterns with these, although I can't
>>> >> think of too many simple pattern-matching type things you'd want to do
>>> >> on these instructions anyways.
>>> >
>>> >
>>> > Yeah.  My gut also tells me that shaders which are "advanced" enough to
>>> > use
>>> > subgroup features probably don't need (or it can't be done) the massive
>>> > reductions we do for D3D9-generated shaders.
>>> >
>>> >>
>>> >> Maybe something like not(any(not(foo)))
>>> >> -> all(foo) and vice-versa?
>>> >>
>>> >> >
>>> >> > On Mon, Jun 5, 2017 at 12:22 PM, Connor Abbott <cwabbott0 at gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Signed-off-by: Connor Abbott <cwabbott0 at gmail.com>
>>> >> >> ---
>>> >> >>  src/compiler/nir/nir_intrinsics.h | 14 ++++++++++++++
>>> >> >>  src/compiler/nir/nir_opcodes.py   | 18 ++++++++++++++++--
>>> >> >>  2 files changed, 30 insertions(+), 2 deletions(-)
>>> >> >>
>>> >> >> diff --git a/src/compiler/nir/nir_intrinsics.h
>>> >> >> b/src/compiler/nir/nir_intrinsics.h
>>> >> >> index 21e7d90..157df7f 100644
>>> >> >> --- a/src/compiler/nir/nir_intrinsics.h
>>> >> >> +++ b/src/compiler/nir/nir_intrinsics.h
>>> >> >> @@ -330,6 +330,20 @@ SYSTEM_VALUE(channel_num, 1, 0, xx, xx, xx)
>>> >> >>  SYSTEM_VALUE(alpha_ref_float, 1, 0, xx, xx, xx)
>>> >> >>  SYSTEM_VALUE(layer_id, 1, 0, xx, xx, xx)
>>> >> >>  SYSTEM_VALUE(view_index, 1, 0, xx, xx, xx)
>>> >> >> +SYSTEM_VALUE(subgroup_invocation, 1, 0, xx, xx, xx)
>>> >> >> +
>>> >> >> +
>>> >> >> +/* ARB_shader_ballot instructions */
>>> >> >> +
>>> >> >> +SYSTEM_VALUE(subgroup_eq_mask, 1, 0, xx, xx, xx)
>>> >> >> +SYSTEM_VALUE(subgroup_ge_mask, 1, 0, xx, xx, xx)
>>> >> >> +SYSTEM_VALUE(subgroup_gt_mask, 1, 0, xx, xx, xx)
>>> >> >> +SYSTEM_VALUE(subgroup_le_mask, 1, 0, xx, xx, xx)
>>> >> >> +SYSTEM_VALUE(subgroup_lt_mask, 1, 0, xx, xx, xx)
>>> >> >> +
>>> >> >> +INTRINSIC(ballot, 1, ARR(0), true, 0, 0, 0, xx, xx, xx,
>>> >> >> +          NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER
>>> >> >> |
>>> >> >> +          NIR_INTRINSIC_CROSS_THREAD)
>>> >> >>
>>> >> >>  /* Blend constant color values.  Float values are clamped. */
>>> >> >>  SYSTEM_VALUE(blend_const_color_r_float, 1, 0, xx, xx, xx)
>>> >> >> diff --git a/src/compiler/nir/nir_opcodes.py
>>> >> >> b/src/compiler/nir/nir_opcodes.py
>>> >> >> index be3ab6d..05a80b2 100644
>>> >> >> --- a/src/compiler/nir/nir_opcodes.py
>>> >> >> +++ b/src/compiler/nir/nir_opcodes.py
>>> >> >> @@ -120,8 +120,10 @@ def opcode(name, output_size, output_type,
>>> >> >> input_sizes, input_types,
>>> >> >>                            input_types, convergent, cross_thread,
>>> >> >>                            algebraic_properties, const_expr)
>>> >> >>
>>> >> >> -def unop_convert(name, out_type, in_type, const_expr):
>>> >> >> -   opcode(name, 0, out_type, [0], [in_type], "", const_expr)
>>> >> >> +def unop_convert(name, out_type, in_type, const_expr,
>>> >> >> cross_thread=False,
>>> >> >> +                 convergent=False):
>>> >> >> +   opcode(name, 0, out_type, [0], [in_type], "", const_expr,
>>> >> >> convergent,
>>> >> >> +          cross_thread)
>>> >> >>
>>> >> >>  def unop(name, ty, const_expr, convergent=False,
>>> >> >> cross_thread=False):
>>> >> >>     opcode(name, 0, ty, [0], [ty], "", const_expr, convergent,
>>> >> >> cross_thread)
>>> >> >> @@ -355,6 +357,18 @@ for i in xrange(1, 5):
>>> >> >>     for j in xrange(1, 5):
>>> >> >>        unop_horiz("fnoise{0}_{1}".format(i, j), i, tfloat, j,
>>> >> >> tfloat,
>>> >> >> "0.0f")
>>> >> >>
>>> >> >> +# ARB_shader_ballot instructions
>>> >> >> +
>>> >> >> +opcode("read_invocation", 0, tuint, [0, 1], [tuint, tuint32], "",
>>> >> >> "src0",
>>> >> >> +        cross_thread=True)
>>> >> >> +unop("read_first_invocation", tuint, "src0", cross_thread=True)
>>> >> >> +
>>> >> >> +# ARB_shader_group_vote instructions
>>> >> >> +
>>> >> >> +unop("any_invocations", tbool, "src0", cross_thread=True)
>>> >> >> +unop("all_invocations", tbool, "src0", cross_thread=True)
>>> >> >> +unop("all_invocations_equal", tbool, "true", cross_thread=True)
>>> >> >> +
>>> >> >>  def binop_convert(name, out_type, in_type, alg_props, const_expr):
>>> >> >>     opcode(name, 0, out_type, [0, 0], [in_type, in_type],
>>> >> >> alg_props,
>>> >> >> const_expr)
>>> >> >>
>>> >> >> --
>>> >> >> 2.9.3
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> mesa-dev mailing list
>>> >> >> mesa-dev at lists.freedesktop.org
>>> >> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>> >> >
>>> >> >
>>> >
>>> >
>>
>>
>