[Mesa-dev] [PATCH 4/4] nir: add ARB_shader_ballot and ARB_shader_group_vote instructions

Connor Abbott cwabbott0 at gmail.com
Tue Jun 6 21:17:04 UTC 2017


On Tue, Jun 6, 2017 at 1:48 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Tue, Jun 6, 2017 at 1:45 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>
>>
>> On Mon, Jun 5, 2017 at 9:52 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>
>>> On Mon, Jun 5, 2017 at 6:37 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>>>
>>>> I pushed a v2 at
>>>> https://cgit.freedesktop.org/~cwabbott0/mesa/log/?h=nir-divergence-v2.
>>>> I'm not sure if I like this version better, though. I'll have to think
>>>> about it. In the meantime, feel free to take a look.
>>>
>>>
>>> I've taken a skim through the branch and I agree that I'm not sure either.
>>> Here's a few thoughts in no particular order:
>>>
>>>  1) Other than the fact that it's a pile of churn, it doesn't seem to make
>>> too much difference whether dFdx and dFdy are ALU or intrinsics
>>>
>>>  2) Convergent instructions are, in a lot of ways, easier to deal with
>>> than plain cross-thread ones.  Convergent ops can always be moved up the
>>> dominance tree or down into uniform control-flow.  Regular cross-thread
>>> instructions can't be moved across any non-uniform control-flow.
>>>
>>>  3) dFdx and dFdy are weird because they're convergent so it's clear they
>>> are special but not clear they should be intrinsics instead of ALU
>>>
>>>  4) I like the nir_instr_is_convergent() and nir_instr_is_cross_thread()
>>> helpers
>>>
>>>  5) non-convergent cross-thread instructions should definitely be
>>> intrinsics.
>>>
>>>  6) I think the shader ballot stuff is all non-convergent cross-thread as
>>> are some of the more advanced subgroup operations (see HLSL shader model
>>> 6.0).
>>
>>
>> Having slept on things a bit, I think I've come to the conclusion that
>> leaving dFdx and dFdy as-is should be fine so long as we have the
>> nir_instr_is_convergent() and _is_cross_thread() helpers.  We need to do
>> special casing in those for texture instructions anyway so adding in a quick
>> switch for ALU derivatives isn't bad.  For shader_ballot type instructions,
>> I think they're probably best done as intrinsics for now.  That way the
>> compiler will leave them alone most of the time and only things that
>> actually know what they're doing will ever try to optimize them.
>>
>> --Jason
>
> Ok, that sounds good.

I pushed a nir-divergence-v3 branch which does just that. I'll start
using that as a base for my work on radv.

>
>>
>>>
>>> That's all for now,
>>>
>>> --Jason
>>>
>>>>
>>>> On Mon, Jun 5, 2017 at 2:43 PM, Jason Ekstrand <jason at jlekstrand.net>
>>>> wrote:
>>>> > On Mon, Jun 5, 2017 at 1:50 PM, Connor Abbott <cwabbott0 at gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> On Mon, Jun 5, 2017 at 1:37 PM, Jason Ekstrand <jason at jlekstrand.net>
>>>> >> wrote:
>>>> >> > I'm not sure how I feel about having these as ALU operations.  ALU
>>>> >> > operations are generally pure functions (with the exception
>>>> >> > derivative)
>>>> >> > that
>>>> >> > can be re-ordered at will.  I don't really like breaking that.  In
>>>> >> > fact,
>>>> >> > I'd
>>>> >> > almost be inclined to make derivatives intrinsics and just
>>>> >> > special-case
>>>> >> > them
>>>> >> > in constant folding.  Thoughts?
>>>> >>
>>>> >> I wasn't too sure about this either. It is a little weird to make
>>>> >> these ALU instructions. I followed the rule here that if something can
>>>> >> be constant-folded, it should be an ALU instruction, but I guess you
>>>> >> can argue that it's just a coincidence that these can be
>>>> >> constant-folded anyways.
>>>> >
>>>> >
>>>> > Yeah.  As subgroup ops get more complicated, I think a log of the
>>>> > subgroup
>>>> > operations can be constant-folded after a fashion but the rules get
>>>> > weird
>>>> > fast.
>>>> >
>>>> >>
>>>> >> I guess the main downside is that it would be
>>>> >> impossible to make nir_algebraic patterns with these, although I can't
>>>> >> think of too many simple pattern-matching type things you'd want to do
>>>> >> on these instructions anyways.
>>>> >
>>>> >
>>>> > Yeah.  My gut also tells me that shaders which are "advanced" enough to
>>>> > use
>>>> > subgroup features probably don't need (or it can't be done) the massive
>>>> > reductions we do for D3D9-generated shaders.
>>>> >
>>>> >>
>>>> >> Maybe something like not(any(not(foo)))
>>>> >> -> all(foo) and vice-versa?
>>>> >>
>>>> >> >
>>>> >> > On Mon, Jun 5, 2017 at 12:22 PM, Connor Abbott <cwabbott0 at gmail.com>
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> Signed-off-by: Connor Abbott <cwabbott0 at gmail.com>
>>>> >> >> ---
>>>> >> >>  src/compiler/nir/nir_intrinsics.h | 14 ++++++++++++++
>>>> >> >>  src/compiler/nir/nir_opcodes.py   | 18 ++++++++++++++++--
>>>> >> >>  2 files changed, 30 insertions(+), 2 deletions(-)
>>>> >> >>
>>>> >> >> diff --git a/src/compiler/nir/nir_intrinsics.h
>>>> >> >> b/src/compiler/nir/nir_intrinsics.h
>>>> >> >> index 21e7d90..157df7f 100644
>>>> >> >> --- a/src/compiler/nir/nir_intrinsics.h
>>>> >> >> +++ b/src/compiler/nir/nir_intrinsics.h
>>>> >> >> @@ -330,6 +330,20 @@ SYSTEM_VALUE(channel_num, 1, 0, xx, xx, xx)
>>>> >> >>  SYSTEM_VALUE(alpha_ref_float, 1, 0, xx, xx, xx)
>>>> >> >>  SYSTEM_VALUE(layer_id, 1, 0, xx, xx, xx)
>>>> >> >>  SYSTEM_VALUE(view_index, 1, 0, xx, xx, xx)
>>>> >> >> +SYSTEM_VALUE(subgroup_invocation, 1, 0, xx, xx, xx)
>>>> >> >> +
>>>> >> >> +
>>>> >> >> +/* ARB_shader_ballot instructions */
>>>> >> >> +
>>>> >> >> +SYSTEM_VALUE(subgroup_eq_mask, 1, 0, xx, xx, xx)
>>>> >> >> +SYSTEM_VALUE(subgroup_ge_mask, 1, 0, xx, xx, xx)
>>>> >> >> +SYSTEM_VALUE(subgroup_gt_mask, 1, 0, xx, xx, xx)
>>>> >> >> +SYSTEM_VALUE(subgroup_le_mask, 1, 0, xx, xx, xx)
>>>> >> >> +SYSTEM_VALUE(subgroup_lt_mask, 1, 0, xx, xx, xx)
>>>> >> >> +
>>>> >> >> +INTRINSIC(ballot, 1, ARR(0), true, 0, 0, 0, xx, xx, xx,
>>>> >> >> +          NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER
>>>> >> >> |
>>>> >> >> +          NIR_INTRINSIC_CROSS_THREAD)
>>>> >> >>
>>>> >> >>  /* Blend constant color values.  Float values are clamped. */
>>>> >> >>  SYSTEM_VALUE(blend_const_color_r_float, 1, 0, xx, xx, xx)
>>>> >> >> diff --git a/src/compiler/nir/nir_opcodes.py
>>>> >> >> b/src/compiler/nir/nir_opcodes.py
>>>> >> >> index be3ab6d..05a80b2 100644
>>>> >> >> --- a/src/compiler/nir/nir_opcodes.py
>>>> >> >> +++ b/src/compiler/nir/nir_opcodes.py
>>>> >> >> @@ -120,8 +120,10 @@ def opcode(name, output_size, output_type,
>>>> >> >> input_sizes, input_types,
>>>> >> >>                            input_types, convergent, cross_thread,
>>>> >> >>                            algebraic_properties, const_expr)
>>>> >> >>
>>>> >> >> -def unop_convert(name, out_type, in_type, const_expr):
>>>> >> >> -   opcode(name, 0, out_type, [0], [in_type], "", const_expr)
>>>> >> >> +def unop_convert(name, out_type, in_type, const_expr,
>>>> >> >> cross_thread=False,
>>>> >> >> +                 convergent=False):
>>>> >> >> +   opcode(name, 0, out_type, [0], [in_type], "", const_expr,
>>>> >> >> convergent,
>>>> >> >> +          cross_thread)
>>>> >> >>
>>>> >> >>  def unop(name, ty, const_expr, convergent=False,
>>>> >> >> cross_thread=False):
>>>> >> >>     opcode(name, 0, ty, [0], [ty], "", const_expr, convergent,
>>>> >> >> cross_thread)
>>>> >> >> @@ -355,6 +357,18 @@ for i in xrange(1, 5):
>>>> >> >>     for j in xrange(1, 5):
>>>> >> >>        unop_horiz("fnoise{0}_{1}".format(i, j), i, tfloat, j,
>>>> >> >> tfloat,
>>>> >> >> "0.0f")
>>>> >> >>
>>>> >> >> +# ARB_shader_ballot instructions
>>>> >> >> +
>>>> >> >> +opcode("read_invocation", 0, tuint, [0, 1], [tuint, tuint32], "",
>>>> >> >> "src0",
>>>> >> >> +        cross_thread=True)
>>>> >> >> +unop("read_first_invocation", tuint, "src0", cross_thread=True)
>>>> >> >> +
>>>> >> >> +# ARB_shader_group_vote instructions
>>>> >> >> +
>>>> >> >> +unop("any_invocations", tbool, "src0", cross_thread=True)
>>>> >> >> +unop("all_invocations", tbool, "src0", cross_thread=True)
>>>> >> >> +unop("all_invocations_equal", tbool, "true", cross_thread=True)
>>>> >> >> +
>>>> >> >>  def binop_convert(name, out_type, in_type, alg_props, const_expr):
>>>> >> >>     opcode(name, 0, out_type, [0, 0], [in_type, in_type],
>>>> >> >> alg_props,
>>>> >> >> const_expr)
>>>> >> >>
>>>> >> >> --
>>>> >> >> 2.9.3
>>>> >> >>
>>>> >> >> _______________________________________________
>>>> >> >> mesa-dev mailing list
>>>> >> >> mesa-dev at lists.freedesktop.org
>>>> >> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>> >> >
>>>> >> >
>>>> >
>>>> >
>>>
>>>
>>


More information about the mesa-dev mailing list