<div dir="ltr">On 15 September 2013 00:10, Francisco Jerez <span dir="ltr"><<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Analogous to the GLSL IR predicate with the same name. This patch<br> fixes the three dead code elimination passes and the VEC4/FS<br> instruction scheduling passes so they leave instructions with side<br> effects alone.<br> <br> At some point it might be interesting to have the instruction<br> scheduler calculate the exact memory dependencies between atomic ops,<br> but they're rare enough that it seems unlikely that it will make any<br> practical difference.<br></blockquote><div><br></div><div>Does ARB_shader_atomic_counters guarantee that order is properly preserved between atomic read operations? In other words, if I do this in a shader:<br><br></div> <div>atomic_uint counter1;<br></div><div>atomic_uint counter2;<br></div><div><br></div><div>void main() {<br> ...<br></div><div> uint a = atomicCounter(counter1);<br></div><div> uint b = atomicCounter(counter2);<br>}<br> <br></div><div>can I be guaranteed that the read from counter2 will happen after the read from counter1? I can't tell from reading the spec but I'm inclined to think we should assume this is guaranteed, just to be on the safe side.<br> <br></div><div>If we make this assumption, then I believe the has_side_effects() predicate is not enough to guarantee the proper ordering. We would need the scheduling code to use a stronger predicate, requires_exact_ordering(), which returns true for both SHADER_OPCODE_UNTYPED_ATOMIC and SHADER_OPCODE_UNTYPED_SURFACE_READ, to ensure that atomic counter reads don't get reordered with respect to each other.<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> ---<br> src/mesa/drivers/dri/i965/brw_fs.cpp | 25 +++++++++-------------<br> .../drivers/dri/i965/brw_schedule_instructions.cpp | 6 +++++-<br> src/mesa/drivers/dri/i965/brw_shader.cpp | 11 ++++++++++<br> src/mesa/drivers/dri/i965/brw_shader.h | 7 ++++++<br> src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +-<br> 5 files changed, 34 insertions(+), 17 deletions(-)<br> <br> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp<br> index 4afe37b..453752c 100644<br> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp<br> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp<br> @@ -1805,7 +1805,7 @@ fs_visitor::dead_code_eliminate()<br> foreach_list_safe(node, &this->instructions) {<br> fs_inst *inst = (fs_inst *)node;<br> <br> - if (inst->dst.file == GRF) {<br> + if (inst->dst.file == GRF && !inst->has_side_effects()) {<br> assert(this->virtual_grf_end[inst->dst.reg] >= pc);<br> if (this->virtual_grf_end[inst->dst.reg] == pc) {<br> inst->remove();<br> @@ -1943,31 +1943,26 @@ fs_visitor::dead_code_eliminate_local()<br> get_dead_code_hash_entry(ht, inst->dst.reg,<br> inst->dst.reg_offset);<br> <br> - if (inst->is_partial_write()) {<br> - /* For a partial write, we can't remove any previous dead code<br> - * candidate, since we're just modifying their result, but we can<br> - * be dead code eliminiated ourselves.<br> - */<br> - if (entry) {<br> - entry->data = inst;<br> + if (entry) {<br> + if (inst->is_partial_write()) {<br> + /* For a partial write, we can't remove any previous dead code<br> + * candidate, since we're just modifying their result.<br> + */<br></blockquote><div><br></div><div>I'm not terribly familiar with this code, so this may be a stupid question, but:<br><br></div><div>Previous to this patch, if entry was non-NULL and inst->is_partial_write(), we would set entry->data = inst. With your rewrite, that doesn't happen anymore. That seems like a problem.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> } else {<br> - insert_dead_code_hash(ht, inst->dst.reg, inst->dst.reg_offset,<br> - inst);<br> - }<br> - } else {<br> - if (entry) {<br> /* We're completely updating a channel, and there was a<br> * previous write to the channel that wasn't read. Kill it!<br> */<br> fs_inst *inst = (fs_inst *)entry->data;<br> inst->remove();<br> progress = true;<br> - _mesa_hash_table_remove(ht, entry);<br> }<br> <br> + _mesa_hash_table_remove(ht, entry);<br></blockquote><div><br></div><div>Previously, we would only remove the entry from the hashtable if entry was non-NULL and !inst->is_partial_write(). Now we remove it whenever entry is non-NULL, regardless of whether inst->is_partial_write(). This also seems like a problem.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> + }<br> +<br> + if (!inst->has_side_effects())<br> insert_dead_code_hash(ht, inst->dst.reg, inst->dst.reg_offset,<br> inst);<br></blockquote><div><br></div><div>Preveiously, we wouldn't insert the instruction in the dead code hash if entry was non-NULL and inst->is_partial_write(). We no longer do that check--was that an intentional change?<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> - }<br> }<br> }<br> }<br> diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp<br> index 5530683..a688336 100644<br> --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp<br> +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp<br> @@ -562,7 +562,8 @@ fs_instruction_scheduler::calculate_deps()<br> schedule_node *n = (schedule_node *)node;<br> fs_inst *inst = (fs_inst *)n->inst;<br> <br> - if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT)<br> + if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT ||<br> + inst->has_side_effects())<br> add_barrier_deps(n);<br> <br> /* read-after-write deps. */<br> @@ -795,6 +796,9 @@ vec4_instruction_scheduler::calculate_deps()<br> schedule_node *n = (schedule_node *)node;<br> vec4_instruction *inst = (vec4_instruction *)n->inst;<br> <br> + if (inst->has_side_effects())<br> + add_barrier_deps(n);<br> +<br></blockquote><div><br></div><div>add_barrier_deps() provides a stronger scheduling guarantee than we need. It guarantees that the given instruction executes prior to *all* instructions that follow it and after *all* instructions that precede it. Actually, the only dependencies we need are between two atomic writes, between an atomic write and an atomic read, and (depending what we decide about my question at the top of this email) possibly between two atomic reads.<br> <br></div><div>I'd prefer to see us add just the dependencies we really need, since that will give the scheduler more freedom to move code around.<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> /* read-after-write deps. */<br> for (int i = 0; i < 3; i++) {<br> if (inst->src[i].file == GRF) {<br> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp<br> index 53364a5..7a47c6c 100644<br> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp<br> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp<br> @@ -566,6 +566,17 @@ backend_instruction::is_control_flow()<br> }<br> }<br> <br> +bool<br> +backend_instruction::has_side_effects() const<br> +{<br> + switch (opcode) {<br> + case SHADER_OPCODE_UNTYPED_ATOMIC:<br> + return true;<br> + default:<br> + return false;<br> + }<br> +}<br> +<br> void<br> backend_visitor::dump_instructions()<br> {<br> diff --git a/src/mesa/drivers/dri/i965/brw_shader.h b/src/mesa/drivers/dri/i965/brw_shader.h<br> index 55769ff..88a5673 100644<br> --- a/src/mesa/drivers/dri/i965/brw_shader.h<br> +++ b/src/mesa/drivers/dri/i965/brw_shader.h<br> @@ -46,6 +46,13 @@ public:<br> bool is_math();<br> bool is_control_flow();<br> <br> + /**<br> + * True if the instruction has side effects other than writing to<br> + * its destination registers. You are expected not to reorder or<br> + * optimize these out unless you know what you are doing.<br> + */<br> + bool has_side_effects() const;<br> +<br> enum opcode opcode; /* BRW_OPCODE_* or FS_OPCODE_* */<br> <br> uint32_t predicate;<br> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp<br> index 6549d4e..1867b22 100644<br> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp<br> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp<br> @@ -313,7 +313,7 @@ vec4_visitor::dead_code_eliminate()<br> foreach_list_safe(node, &this->instructions) {<br> vec4_instruction *inst = (vec4_instruction *)node;<br> <br> - if (inst->dst.file == GRF) {<br> + if (inst->dst.file == GRF && !inst->has_side_effects()) {<br> assert(this->virtual_grf_end[inst->dst.reg] >= pc);<br> if (this->virtual_grf_end[inst->dst.reg] == pc) {<br> inst->remove();<br> <span><font color="#888888">--<br> 1.8.3.4<br> <br> _______________________________________________<br> mesa-dev mailing list<br> <a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br> <a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev" target="_blank">http://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br> </font></span></blockquote></div><br></div></div>