<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jun 12, 2017 at 11:58 AM, Nicolai Hähnle <span dir="ltr"><<a href="mailto:nhaehnle@gmail.com" target="_blank">nhaehnle@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 12.06.2017 20:50, Connor Abbott wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On Mon, Jun 12, 2017 at 2:17 AM, Nicolai Hähnle <<a href="mailto:nhaehnle@gmail.com" target="_blank">nhaehnle@gmail.com</a>> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On 10.06.2017 01:44, Connor Abbott wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
From: Connor Abbott <<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>><br>
<br>
These are properties of the instruction that must be respected when<br>
moving it around, in addition to the usual SSA dominance guarantee.<br>
Previously, we only had special handling for fddx and fddy, in a very<br>
ad-hoc way. But with arb_shader_ballot and arb_shader_group_vote, we'll<br>
have to start handling a lot more instructions with similar constraints,<br>
so we want to add a more formal model of what the optimizer can and<br>
cannot do.<br>
<br>
v2: don't add attribute for ALU instructions<br>
v3: special-case derivative ALU instructions<br>
Signed-off-by: Connor Abbott <<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>><br>
---<br>
src/compiler/nir/nir.h | 80<br>
++++++++++++++++++++++++++++++<wbr>++++++++++++++++++++<br>
1 file changed, 80 insertions(+)<br>
<br>
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h<br>
index 3b827bf..64caccb 100644<br>
--- a/src/compiler/nir/nir.h<br>
+++ b/src/compiler/nir/nir.h<br>
@@ -985,6 +985,25 @@ typedef enum {<br>
* intrinsic are due to the register reads/writes.<br>
*/<br>
NIR_INTRINSIC_CAN_REORDER = (1 << 1),<br>
+<br>
+ /**<br>
+ * Indicates whether this intrinsic is "cross-thread". An operation is<br>
+ * cross-thread if results in one thread depend on inputs in another<br>
thread,<br>
+ * and therefore optimizations cannot change the execution mask when<br>
the<br>
+ * operation is called. Examples of cross-thread operations include<br>
+ * screen-space derivatives, the "any" reduction which returns "true"<br>
in<br>
+ * all threads if any thread inputs "true", etc.<br>
+ */<br>
+ NIR_INTRINSIC_CROSS_THREAD,<br>
+<br>
+ /**<br>
+ * Indicates that this intrinsic is "convergent". An operation is<br>
+ * convergent when it must always be called in convergent control<br>
flow,<br>
+ * that is, control flow with the same execution mask as when the<br>
program<br>
+ * started. If an operation is convergent, it must be cross-thread as<br>
well,<br>
+ * since the optimizer must maintain the guarantee.<br>
+ */<br>
+ NIR_INTRINSIC_CONVERGENT,<br>
</blockquote>
<br>
<br>
This is inconsistent with LLVM's definition of 'convergent', and I'd like<br>
you to change it to match up with LLVM.<br>
<br>
LLVM's definition of convergent is: "The operation must not be made<br>
control-dependent on additional values."<br>
<br>
In the language of execution masks, this means that optimizations must<br>
guarantee that the execution mask for the instruction can only become a<br>
superset of what it was originally. This means lifting is actually okay.<br>
<br>
This is relevant because e.g. texture instructions with implicit derivatives<br>
are actually convergent operations (in the LLVM sense), but obviously they<br>
can be called with exec masks that are subsets of the exec mask at program<br>
start.<br>
</blockquote>
<br>
Actually, according to GLSL (and I think SPIR-V, although I'm not 100%<br>
sure), they can't be called that way -- results are undefined if<br>
derivatives (or textures that take implicit derivatives) aren't called<br>
in uniform control flow, full stop. That's why I changed the<br>
definition compared to LLVM - this definition of convergent allows all<br>
the optimizations that the LLVM definition does, but it opens up<br>
additional optimization opportunities since we can assume that control<br>
flow is always uniform when doing divergence analysis. Also, as-is,<br>
the definition matches the GLSL/SPIR-V semantics closely, and since<br>
the purpose of the convergent attribute is to model derivatives in<br>
GLSL and SPIR-V, I'd like to keep that. If GLSL or SPIR-V change their<br>
semantics to allow what you describe, then we can add something<br>
something closer to the LLVM convergent semantics. If you want me to<br>
change the name to avoid confusion with LLVM, that's fair though --<br>
suggestions welcome on what to call it ;)<br>
</blockquote>
<br></div></div>
Okay, I'm convinced that it makes sense to have these semantics, but a different name would be good.<br></blockquote><div><br></div><div>I'm not quite so convinced. :-) The LLVM definition seems, at first brush, more powerful than the proposed definition and I think it's actually what you want for most optimizations. The only advantage I can see to the strict uniform definition is that it would let us imply information about control-flow uniformity from instructions. However, while probably technically correct, that sounds like a dangerous path to go down. What specific optimizations were you thinking this stricter definition would enable?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
How about NIR_INTRINSIC_UNIFORM_CONTROL?<br></blockquote><div><br></div><div>That works.<br><br></div><div>--Jason<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Cheers,<br>
Nicolai<div class="HOEnZb"><div class="h5"><br>
<br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
LLVM currently has no equivalent to cross_thread, and we hack around it as<br>
I'm sure you're well aware. The nightmare is trying to find a sound<br>
definition of "cross_thread" that works in LLVM's execution model.<br>
</blockquote>
<br>
Yeah... this stuff is really tricky to reason about. I think that<br>
eventually, we're going to have to add the notions of control flow<br>
divergence and re-convergence to LLVM's execution model, even though<br>
there's been pushback from some LLVM developers about it. I just don't<br>
see any way we'll be able to do stuff like LICM, aggressive CSE, etc.<br>
effectively in the presence of this cross-thread operations, when<br>
whether you can do those things at all depends on whether branch<br>
conditions are uniform.<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Cheers,<br>
Nicolai<br>
<br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
} nir_intrinsic_semantic_flag;<br>
/**<br>
@@ -1459,6 +1478,67 @@ NIR_DEFINE_CAST(nir_instr_as_p<wbr>arallel_copy,<br>
nir_instr,<br>
type, nir_instr_type_parallel_copy)<br>
/*<br>
+ * Helpers to determine if an instruction is cross-thread or convergent.<br>
See<br>
+ * NIR_INTRINSIC_{CONVERGENT|CROS<wbr>S_THREAD} for the definitions.<br>
+ */<br>
+static inline bool<br>
+nir_instr_is_convergent(const nir_instr *instr)<br>
+{<br>
+ switch (instr->type) {<br>
+ case nir_instr_type_alu:<br>
+ switch (nir_instr_as_alu(instr)->op) {<br>
+ case nir_op_fddx:<br>
+ case nir_op_fddy:<br>
+ case nir_op_fddx_fine:<br>
+ case nir_op_fddy_fine:<br>
+ case nir_op_fddx_coarse:<br>
+ case nir_op_fddy_coarse:<br>
+ /* Partial derivatives are convergent */<br>
+ return true;<br>
+<br>
+ default:<br>
+ return false;<br>
+ }<br>
+<br>
+ case nir_instr_type_intrinsic: {<br>
+ nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);<br>
+ return nir_intrinsic_infos[intrin->in<wbr>trinsic].flags &<br>
+ NIR_INTRINSIC_CONVERGENT;<br>
+ }<br>
+<br>
+ case nir_instr_type_tex:<br>
+ switch (nir_instr_as_tex(instr)->op) {<br>
+ case nir_texop_tex:<br>
+ case nir_texop_txb:<br>
+ case nir_texop_lod:<br>
+ /* These three take implicit derivatives, so they are<br>
convergent */<br>
+ return true;<br>
+<br>
+ default:<br>
+ return false;<br>
+ }<br>
+<br>
+ default:<br>
+ return false;<br>
+ }<br>
+}<br>
+<br>
+static inline bool<br>
+nir_instr_is_cross_thread(con<wbr>st nir_instr *instr)<br>
+{<br>
+ switch (instr->type) {<br>
+ case nir_instr_type_intrinsic: {<br>
+ nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);<br>
+ return nir_intrinsic_infos[intrin->in<wbr>trinsic].flags &<br>
+ NIR_INTRINSIC_CROSS_THREAD;<br>
+ }<br>
+<br>
+ default:<br>
+ return nir_instr_is_convergent(instr)<wbr>;<br>
+ }<br>
+}<br>
+<br>
+/*<br>
* Control flow<br>
*<br>
* Control flow consists of a tree of control flow nodes, which include<br>
<br>
</blockquote>
<br>
<br>
--<br>
Lerne, wie die Welt wirklich ist,<br>
Aber vergiss niemals, wie sie sein sollte.<br>
______________________________<wbr>_________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</blockquote></blockquote>
<br>
<br>
-- <br>
Lerne, wie die Welt wirklich ist,<br>
Aber vergiss niemals, wie sie sein sollte.<br>
______________________________<wbr>_________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</div></div></blockquote></div><br></div></div>