<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jun 5, 2017 at 9:52 PM, Jason Ekstrand <span dir="ltr"><<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="">On Mon, Jun 5, 2017 at 6:37 PM, Connor Abbott <span dir="ltr"><<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I pushed a v2 at<br> <a href="https://cgit.freedesktop.org/~cwabbott0/mesa/log/?h=nir-divergence-v2" rel="noreferrer" target="_blank">https://cgit.freedesktop.org/~<wbr>cwabbott0/mesa/log/?h=nir-dive<wbr>rgence-v2</a>.<br> I'm not sure if I like this version better, though. I'll have to think<br> about it. In the meantime, feel free to take a look.<br><div class="m_-4930299485887368796HOEnZb"><div class="m_-4930299485887368796h5"></div></div></blockquote><div><br></div></span><div>I've taken a skim through the branch and I agree that I'm not sure either. Here's a few thoughts in no particular order:<br><br></div><div> 1) Other than the fact that it's a pile of churn, it doesn't seem to make too much difference whether dFdx and dFdy are ALU or intrinsics<br><br></div><div> 2) Convergent instructions are, in a lot of ways, easier to deal with than plain cross-thread ones. Convergent ops can always be moved up the dominance tree or down into uniform control-flow. Regular cross-thread instructions can't be moved across any non-uniform control-flow.<br><br></div><div> 3) dFdx and dFdy are weird because they're convergent so it's clear they are special but not clear they should be intrinsics instead of ALU<br><br></div><div> 4) I like the nir_instr_is_convergent() and nir_instr_is_cross_thread() helpers<br><br></div><div> 5) non-convergent cross-thread instructions should definitely be intrinsics.<br><br></div><div> 6) I think the shader ballot stuff is all non-convergent cross-thread as are some of the more advanced subgroup operations (see HLSL shader model 6.0).<br></div></div></div></div></blockquote><div><br></div><div>Having slept on things a bit, I think I've come to the conclusion that leaving dFdx and dFdy as-is should be fine so long as we have the nir_instr_is_convergent() and _is_cross_thread() helpers. We need to do special casing in those for texture instructions anyway so adding in a quick switch for ALU derivatives isn't bad. For shader_ballot type instructions, I think they're probably best done as intrinsics for now. That way the compiler will leave them alone most of the time and only things that actually know what they're doing will ever try to optimize them.<br><br></div><div>--Jason<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div>That's all for now,<br><br></div><div>--Jason<br></div><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="m_-4930299485887368796HOEnZb"><div class="m_-4930299485887368796h5"> On Mon, Jun 5, 2017 at 2:43 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>> wrote:<br> > On Mon, Jun 5, 2017 at 1:50 PM, Connor Abbott <<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>> wrote:<br> >><br> >> On Mon, Jun 5, 2017 at 1:37 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>><br> >> wrote:<br> >> > I'm not sure how I feel about having these as ALU operations. ALU<br> >> > operations are generally pure functions (with the exception derivative)<br> >> > that<br> >> > can be re-ordered at will. I don't really like breaking that. In fact,<br> >> > I'd<br> >> > almost be inclined to make derivatives intrinsics and just special-case<br> >> > them<br> >> > in constant folding. Thoughts?<br> >><br> >> I wasn't too sure about this either. It is a little weird to make<br> >> these ALU instructions. I followed the rule here that if something can<br> >> be constant-folded, it should be an ALU instruction, but I guess you<br> >> can argue that it's just a coincidence that these can be<br> >> constant-folded anyways.<br> ><br> ><br> > Yeah. As subgroup ops get more complicated, I think a log of the subgroup<br> > operations can be constant-folded after a fashion but the rules get weird<br> > fast.<br> ><br> >><br> >> I guess the main downside is that it would be<br> >> impossible to make nir_algebraic patterns with these, although I can't<br> >> think of too many simple pattern-matching type things you'd want to do<br> >> on these instructions anyways.<br> ><br> ><br> > Yeah. My gut also tells me that shaders which are "advanced" enough to use<br> > subgroup features probably don't need (or it can't be done) the massive<br> > reductions we do for D3D9-generated shaders.<br> ><br> >><br> >> Maybe something like not(any(not(foo)))<br> >> -> all(foo) and vice-versa?<br> >><br> >> ><br> >> > On Mon, Jun 5, 2017 at 12:22 PM, Connor Abbott <<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>><br> >> > wrote:<br> >> >><br> >> >> Signed-off-by: Connor Abbott <<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>><br> >> >> ---<br> >> >> src/compiler/nir/nir_intrinsic<wbr>s.h | 14 ++++++++++++++<br> >> >> src/compiler/nir/nir_opcodes.p<wbr>y | 18 ++++++++++++++++--<br> >> >> 2 files changed, 30 insertions(+), 2 deletions(-)<br> >> >><br> >> >> diff --git a/src/compiler/nir/nir_intrins<wbr>ics.h<br> >> >> b/src/compiler/nir/nir_intrins<wbr>ics.h<br> >> >> index 21e7d90..157df7f 100644<br> >> >> --- a/src/compiler/nir/nir_intrins<wbr>ics.h<br> >> >> +++ b/src/compiler/nir/nir_intrins<wbr>ics.h<br> >> >> @@ -330,6 +330,20 @@ SYSTEM_VALUE(channel_num, 1, 0, xx, xx, xx)<br> >> >> SYSTEM_VALUE(alpha_ref_float, 1, 0, xx, xx, xx)<br> >> >> SYSTEM_VALUE(layer_id, 1, 0, xx, xx, xx)<br> >> >> SYSTEM_VALUE(view_index, 1, 0, xx, xx, xx)<br> >> >> +SYSTEM_VALUE(subgroup_invocat<wbr>ion, 1, 0, xx, xx, xx)<br> >> >> +<br> >> >> +<br> >> >> +/* ARB_shader_ballot instructions */<br> >> >> +<br> >> >> +SYSTEM_VALUE(subgroup_eq_mask<wbr>, 1, 0, xx, xx, xx)<br> >> >> +SYSTEM_VALUE(subgroup_ge_mask<wbr>, 1, 0, xx, xx, xx)<br> >> >> +SYSTEM_VALUE(subgroup_gt_mask<wbr>, 1, 0, xx, xx, xx)<br> >> >> +SYSTEM_VALUE(subgroup_le_mask<wbr>, 1, 0, xx, xx, xx)<br> >> >> +SYSTEM_VALUE(subgroup_lt_mask<wbr>, 1, 0, xx, xx, xx)<br> >> >> +<br> >> >> +INTRINSIC(ballot, 1, ARR(0), true, 0, 0, 0, xx, xx, xx,<br> >> >> + NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER |<br> >> >> + NIR_INTRINSIC_CROSS_THREAD)<br> >> >><br> >> >> /* Blend constant color values. Float values are clamped. */<br> >> >> SYSTEM_VALUE(blend_const_color<wbr>_r_float, 1, 0, xx, xx, xx)<br> >> >> diff --git a/src/compiler/nir/nir_opcodes<wbr>.py<br> >> >> b/src/compiler/nir/nir_opcodes<wbr>.py<br> >> >> index be3ab6d..05a80b2 100644<br> >> >> --- a/src/compiler/nir/nir_opcodes<wbr>.py<br> >> >> +++ b/src/compiler/nir/nir_opcodes<wbr>.py<br> >> >> @@ -120,8 +120,10 @@ def opcode(name, output_size, output_type,<br> >> >> input_sizes, input_types,<br> >> >> input_types, convergent, cross_thread,<br> >> >> algebraic_properties, const_expr)<br> >> >><br> >> >> -def unop_convert(name, out_type, in_type, const_expr):<br> >> >> - opcode(name, 0, out_type, [0], [in_type], "", const_expr)<br> >> >> +def unop_convert(name, out_type, in_type, const_expr,<br> >> >> cross_thread=False,<br> >> >> + convergent=False):<br> >> >> + opcode(name, 0, out_type, [0], [in_type], "", const_expr,<br> >> >> convergent,<br> >> >> + cross_thread)<br> >> >><br> >> >> def unop(name, ty, const_expr, convergent=False, cross_thread=False):<br> >> >> opcode(name, 0, ty, [0], [ty], "", const_expr, convergent,<br> >> >> cross_thread)<br> >> >> @@ -355,6 +357,18 @@ for i in xrange(1, 5):<br> >> >> for j in xrange(1, 5):<br> >> >> unop_horiz("fnoise{0}_{1}".for<wbr>mat(i, j), i, tfloat, j, tfloat,<br> >> >> "0.0f")<br> >> >><br> >> >> +# ARB_shader_ballot instructions<br> >> >> +<br> >> >> +opcode("read_invocation", 0, tuint, [0, 1], [tuint, tuint32], "",<br> >> >> "src0",<br> >> >> + cross_thread=True)<br> >> >> +unop("read_first_invocation", tuint, "src0", cross_thread=True)<br> >> >> +<br> >> >> +# ARB_shader_group_vote instructions<br> >> >> +<br> >> >> +unop("any_invocations", tbool, "src0", cross_thread=True)<br> >> >> +unop("all_invocations", tbool, "src0", cross_thread=True)<br> >> >> +unop("all_invocations_equal", tbool, "true", cross_thread=True)<br> >> >> +<br> >> >> def binop_convert(name, out_type, in_type, alg_props, const_expr):<br> >> >> opcode(name, 0, out_type, [0, 0], [in_type, in_type], alg_props,<br> >> >> const_expr)<br> >> >><br> >> >> --<br> >> >> 2.9.3<br> >> >><br> >> >> ______________________________<wbr>_________________<br> >> >> mesa-dev mailing list<br> >> >> <a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br> >> >> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br> >> ><br> >> ><br> ><br> ><br> </div></div></blockquote></div></div></div><br></div></div> </blockquote></div><br></div></div>