<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Dec 17, 2014 at 1:52 PM, Connor Abbott <span dir="ltr"><<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I'm sure you're already aware, but there are two things we could do to<br>
speed this up:<br>
<br>
1. Pre-compute def/kill sets for each block similar to what i965 does.<br></blockquote><div><br></div><div>Sure, but we walk the instructions at most deepest block depth + 1 and the depest we've ever seen in the wild is 2.<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
2. Use a worklist + an array of flags for "this block is in the<br>
worklist" rather than walking all the basic blocks in reverse to find<br>
the few we need to update.<br></blockquote><div><br></div><div>Sure, we could, but I don't see how pushing the blocks onto a stack and then popping them back off really gains us anything over just walking them. If there's something I'm missing, please let me know because it's not jumping out at me. I'll freely admit that I'm not terribly familiar with worklists.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Wrt #2, we already use a worklist in the DCE pass, but it's kinda lame<br>
because it's using a linked list when we could just allocate an array<br>
of pointers up-front based on the maximum size (the number of blocks<br>
in this case, the number of SSA definitions in that case) and use it<br>
as a ringbuffer. It would be a nice cleanup to implement such a<br>
bounded worklist and share it between these two passes, since we'll<br>
probably want to use it for lots of other passes too.<br>
<br>
I don't either thing should block merging this, though.<br>
<br>
A few other comments below.<br>
<div><div class="h5"><br>
On Tue, Dec 16, 2014 at 1:05 AM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote:<br>
> ---<br>
> src/glsl/Makefile.sources | 1 +<br>
> src/glsl/nir/nir.h | 13 ++<br>
> src/glsl/nir/nir_live_variables.c | 282 ++++++++++++++++++++++++++++++++++++++<br>
> src/glsl/nir/nir_metadata.c | 2 +<br>
> src/mesa/main/bitset.h | 1 +<br>
> 5 files changed, 299 insertions(+)<br>
> create mode 100644 src/glsl/nir/nir_live_variables.c<br>
><br>
> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources<br>
> index 4eb6320..433224e 100644<br>
> --- a/src/glsl/Makefile.sources<br>
> +++ b/src/glsl/Makefile.sources<br>
> @@ -20,6 +20,7 @@ NIR_FILES = \<br>
> $(GLSL_SRCDIR)/nir/nir_from_ssa.c \<br>
> $(GLSL_SRCDIR)/nir/nir_intrinsics.c \<br>
> $(GLSL_SRCDIR)/nir/nir_intrinsics.h \<br>
> + $(GLSL_SRCDIR)/nir/nir_live_variables.c \<br>
> $(GLSL_SRCDIR)/nir/nir_lower_atomics.c \<br>
> $(GLSL_SRCDIR)/nir/nir_lower_samplers.cpp \<br>
> $(GLSL_SRCDIR)/nir/nir_lower_system_values.c \<br>
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h<br>
> index f744736..2f2edb6 100644<br>
> --- a/src/glsl/nir/nir.h<br>
> +++ b/src/glsl/nir/nir.h<br>
> @@ -420,6 +420,9 @@ typedef struct {<br>
> /** generic SSA definition index. */<br>
> unsigned index;<br>
><br>
> + /** Index into the live_in and live_out bitfields */<br>
> + unsigned live_index;<br>
> +<br>
> nir_instr *parent_instr;<br>
><br>
> struct set *uses;<br>
> @@ -999,6 +1002,10 @@ typedef struct nir_block {<br>
> struct nir_block **dom_children;<br>
><br>
> struct set *dom_frontier;<br>
> +<br>
> + /* live in and out for this block; used for liveness analysis */<br>
> + BITSET_WORD *live_in;<br>
> + BITSET_WORD *live_out;<br>
> } nir_block;<br>
><br>
> #define nir_block_first_instr(block) \<br>
> @@ -1047,6 +1054,7 @@ typedef enum {<br>
> nir_metadata_none = 0x0,<br>
> nir_metadata_block_index = 0x1,<br>
> nir_metadata_dominance = 0x2,<br>
> + nir_metadata_live_variables = 0x4,<br>
> } nir_metadata;<br>
><br>
> typedef struct {<br>
> @@ -1274,6 +1282,8 @@ bool nir_foreach_src(nir_instr *instr, nir_foreach_src_cb cb, void *state);<br>
> typedef bool (*nir_foreach_block_cb)(nir_block *block, void *state);<br>
> bool nir_foreach_block(nir_function_impl *impl, nir_foreach_block_cb cb,<br>
> void *state);<br>
> +bool nir_foreach_block_reverse(nir_function_impl *impl, nir_foreach_block_cb cb,<br>
> + void *state);<br>
<br>
</div></div>This hunk should go in the previous patch that defined this function.<br>
<div><div class="h5"><br>
><br>
> /* If the following CF node is an if, this function returns that if.<br>
> * Otherwise, it returns NULL.<br>
> @@ -1318,6 +1328,9 @@ void nir_lower_system_values(nir_shader *shader);<br>
><br>
> void nir_lower_atomics(nir_shader *shader);<br>
><br>
> +void nir_live_variables_impl(nir_function_impl *impl);<br>
> +bool nir_ssa_defs_interfere(nir_ssa_def *a, nir_ssa_def *b);<br>
> +<br>
> void nir_convert_to_ssa_impl(nir_function_impl *impl);<br>
> void nir_convert_to_ssa(nir_shader *shader);<br>
> void nir_convert_from_ssa(nir_shader *shader);<br>
> diff --git a/src/glsl/nir/nir_live_variables.c b/src/glsl/nir/nir_live_variables.c<br>
> new file mode 100644<br>
> index 0000000..7d99a06<br>
> --- /dev/null<br>
> +++ b/src/glsl/nir/nir_live_variables.c<br>
> @@ -0,0 +1,282 @@<br>
> +/*<br>
> + * Copyright © 2014 Intel Corporation<br>
> + *<br>
> + * Permission is hereby granted, free of charge, to any person obtaining a<br>
> + * copy of this software and associated documentation files (the "Software"),<br>
> + * to deal in the Software without restriction, including without limitation<br>
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,<br>
> + * and/or sell copies of the Software, and to permit persons to whom the<br>
> + * Software is furnished to do so, subject to the following conditions:<br>
> + *<br>
> + * The above copyright notice and this permission notice (including the next<br>
> + * paragraph) shall be included in all copies or substantial portions of the<br>
> + * Software.<br>
> + *<br>
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR<br>
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,<br>
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL<br>
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER<br>
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING<br>
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS<br>
> + * IN THE SOFTWARE.<br>
> + *<br>
> + * Authors:<br>
> + * Jason Ekstrand (<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>)<br>
> + */<br>
> +<br>
> +#include "nir.h"<br>
> +<br>
> +/*<br>
> + * Basic liveness analysis. This works only in SSA form.<br>
> + *<br>
> + * This liveness pass treats phi nodes as being melded to the space between<br>
> + * blocks so that the destinations of a phi are in the livein of the block<br>
> + * in which it resides and the sources are in the liveout of the<br>
> + * corresponding block. By formulating the liveness information in this<br>
> + * way, we ensure that the definition of any variable dominates its entire<br>
> + * live range. This is true because the only way that the definition of an<br>
> + * SSA value may not dominate a use is if the use is in a phi node and the<br>
> + * uses in phi no are in the live-out of the corresponding predecessor<br>
> + * block but not in the live-in of the block containing the phi node.<br>
> + */<br>
> +<br>
> +struct live_variables_state {<br>
> + unsigned num_ssa_defs;<br>
> + unsigned bitset_words;<br>
> + BITSET_WORD progress;<br>
<br>
</div></div>I'd rather have this be<br>
<br>
bool progress;<br>
<br>
since it's effectively being used as a boolean, no need to confuse<br>
people with this and it's just unnecessary micro-optimization.<br></blockquote><div><br></div><div>Sure. That can be done.<br></div><div>--Jason<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div class="h5"><br>
> +};<br>
> +<br>
> +static bool<br>
> +index_dest(nir_dest *dest, void *void_state)<br>
> +{<br>
> + struct live_variables_state *state = void_state;<br>
> +<br>
> + if (dest->is_ssa)<br>
> + dest->ssa.live_index = state->num_ssa_defs++;<br>
> +<br>
> + return true;<br>
> +}<br>
> +<br>
> +static bool<br>
> +index_ssa_definitions_block(nir_block *block, void *void_state)<br>
> +{<br>
> + struct live_variables_state *state = void_state;<br>
> +<br>
> + nir_foreach_instr(block, instr) {<br>
> + if (instr->type == nir_instr_type_ssa_undef) {<br>
> + nir_ssa_undef_instr *undef = nir_instr_as_ssa_undef(instr);<br>
> + undef->def.live_index = 0;<br>
> + } else {<br>
> + nir_foreach_dest(instr, index_dest, state);<br>
> + }<br>
> + }<br>
> +<br>
> + return true;<br>
> +}<br>
> +<br>
> +static bool<br>
> +init_liveness_block(nir_block *block, void *void_state)<br>
> +{<br>
> + struct live_variables_state *state = void_state;<br>
> +<br>
> + block->live_in = reralloc(block, block->live_in, BITSET_WORD,<br>
> + state->bitset_words);<br>
> + memset(block->live_in, 0, state->bitset_words * sizeof(BITSET_WORD));<br>
> +<br>
> + block->live_out = reralloc(block, block->live_out, BITSET_WORD,<br>
> + state->bitset_words);<br>
> + memset(block->live_out, 0, state->bitset_words * sizeof(BITSET_WORD));<br>
> +<br>
> + return true;<br>
> +}<br>
> +<br>
> +static bool<br>
> +set_src_live(nir_src *src, void *void_live)<br>
> +{<br>
> + BITSET_WORD *live = void_live;<br>
> +<br>
> + if (!src->is_ssa)<br>
> + return true;<br>
> +<br>
> + if (src->ssa->live_index == 0)<br>
> + return true; /* undefined variables are never live */<br>
> +<br>
> + BITSET_SET(live, src->ssa->live_index);<br>
> +<br>
> + return true;<br>
> +}<br>
> +<br>
> +static bool<br>
> +set_dest_dead(nir_dest *dest, void *void_live)<br>
> +{<br>
> + BITSET_WORD *live = void_live;<br>
> +<br>
> + if (dest->is_ssa)<br>
> + BITSET_CLEAR(live, dest->ssa.live_index);<br>
> +<br>
> + return true;<br>
> +}<br>
> +<br>
> +/* Phi nodes exist "between" blocks and all the phi nodes at the start of a<br>
> + * block act "in parallel". When we propagate from the live_in of one<br>
> + * block to the live out of the other, we have to kill any writes from phis<br>
> + * and make live any sources.<br>
> + */<br>
> +static void<br>
> +propagate_across_edge(nir_block *pred, nir_block *succ,<br>
> + struct live_variables_state *state)<br>
> +{<br>
> + BITSET_WORD live[state->bitset_words];<br>
> + memcpy(live, succ->live_in, sizeof live);<br>
> +<br>
> + nir_foreach_instr(succ, instr) {<br>
> + if (instr->type != nir_instr_type_phi)<br>
> + break;<br>
> + nir_phi_instr *phi = nir_instr_as_phi(instr);<br>
> +<br>
> + set_dest_dead(&phi->dest, live);<br>
> + }<br>
> +<br>
> + nir_foreach_instr(succ, instr) {<br>
> + if (instr->type != nir_instr_type_phi)<br>
> + break;<br>
> + nir_phi_instr *phi = nir_instr_as_phi(instr);<br>
> +<br>
> + foreach_list_typed(nir_phi_src, src, node, &phi->srcs) {<br>
> + if (src->pred == pred) {<br>
> + set_src_live(&src->src, live);<br>
> + break;<br>
> + }<br>
> + }<br>
> + }<br>
> +<br>
> + for (unsigned i = 0; i < state->bitset_words; ++i) {<br>
> + state->progress |= live[i] & ~pred->live_out[i];<br>
<br>
</div></div>state->progress = state->progress || (live[i] & ~pred->live_out[i]) != 0;<br>
<div><div class="h5"><br>
> + pred->live_out[i] |= live[i];<br>
> + }<br>
> +}<br>
> +<br>
> +static bool<br>
> +walk_instructions_block(nir_block *block, void *void_state)<br>
> +{<br>
> + struct live_variables_state *state = void_state;<br>
> +<br>
> + /* The live out is the union (modulo phi nodes) of the live ins of its<br>
> + * successors */<br>
> + if (block->successors[0])<br>
> + propagate_across_edge(block, block->successors[0], state);<br>
> + if (block->successors[1])<br>
> + propagate_across_edge(block, block->successors[1], state);<br>
> +<br>
> + memcpy(block->live_in, block->live_out,<br>
> + state->bitset_words * sizeof(BITSET_WORD));<br>
> +<br>
> + nir_if *following_if = nir_block_following_if(block);<br>
> + if (following_if)<br>
> + set_src_live(&following_if->condition, block->live_in);<br>
> +<br>
> + nir_foreach_instr_reverse(block, instr) {<br>
> + /* Phi nodes are handled seperately so we want to skip them. Since<br>
> + * we are going backwards and they are at the beginning, we can just<br>
> + * break as soon as we see one.<br>
> + */<br>
> + if (instr->type == nir_instr_type_phi)<br>
> + break;<br>
> +<br>
> + nir_foreach_dest(instr, set_dest_dead, block->live_in);<br>
> + nir_foreach_src(instr, set_src_live, block->live_in);<br>
> + }<br>
> +<br>
> + return true;<br>
> +}<br>
> +<br>
> +static bool<br>
> +src_does_not_use_def(nir_src *src, void *def)<br>
> +{<br>
> + return !src->is_ssa || src->ssa != (nir_ssa_def *)def;<br>
> +}<br>
> +<br>
> +static bool<br>
> +search_for_use_after_instr(nir_instr *start, nir_ssa_def *def)<br>
> +{<br>
> + /* Only look for a use strictly after the given instruction */<br>
> + struct exec_node *node = start->node.next;<br>
> + while (!exec_node_is_tail_sentinel(node)) {<br>
> + nir_instr *instr = exec_node_data(nir_instr, node, node);<br>
> + if (!nir_foreach_src(instr, src_does_not_use_def, def))<br>
> + return true;<br>
> + node = node->next;<br>
> + }<br>
> + return false;<br>
> +}<br>
> +<br>
> +/* Returns true if def is live at instr assuming that def comes before<br>
> + * instr in a pre DFS search of the dominance tree.<br>
> + */<br>
> +static bool<br>
> +nir_ssa_def_is_live_at(nir_ssa_def *def, nir_instr *instr)<br>
> +{<br>
> + if (BITSET_TEST(instr->block->live_out, def->live_index)) {<br>
> + /* Since def dominates instr, if def is in the liveout of the block,<br>
> + * it's live at instr<br>
> + */<br>
> + return true;<br>
> + } else {<br>
> + if (BITSET_TEST(instr->block->live_in, def->live_index) ||<br>
> + def->parent_instr->block == instr->block) {<br>
> + /* In this case it is either live coming into instr's block or it<br>
> + * is defined in the same block. In this case, we simply need to<br>
> + * see if it is used after instr.<br>
> + */<br>
> + return search_for_use_after_instr(instr, def);<br>
> + } else {<br>
> + return false;<br>
> + }<br>
> + }<br>
> +}<br>
> +<br>
> +bool<br>
> +nir_ssa_defs_interfere(nir_ssa_def *a, nir_ssa_def *b)<br>
> +{<br>
> + if (a->parent_instr == b->parent_instr) {<br>
> + /* Two variables defined at the same time interfere assuming at<br>
> + * least one isn't dead.<br>
> + */<br>
> + return true;<br>
> + } else if (a->live_index == 0 || b->live_index == 0) {<br>
> + /* If either variable is an ssa_undef, then there's no interference */<br>
> + return false;<br>
> + } else if (a->live_index < b->live_index) {<br>
> + return nir_ssa_def_is_live_at(a, b->parent_instr);<br>
> + } else {<br>
> + return nir_ssa_def_is_live_at(b, a->parent_instr);<br>
> + }<br>
> +}<br>
> +<br>
> +void<br>
> +nir_live_variables_impl(nir_function_impl *impl)<br>
> +{<br>
> + struct live_variables_state state;<br>
> +<br>
> + /* We start at 1 because we reserve the index value of 0 for ssa_undef<br>
> + * instructions. Those are never live, so their liveness information<br>
> + * can be compacted into a single bit.<br>
> + */<br>
> + state.num_ssa_defs = 1;<br>
> + nir_foreach_block(impl, index_ssa_definitions_block, &state);<br>
> +<br>
> + /* We now know how many unique ssa definitions we have and we can go<br>
> + * ahead and allocate live_in and live_out sets<br>
> + */<br>
> + state.bitset_words = BITSET_WORDS(state.num_ssa_defs);<br>
> + nir_foreach_block(impl, init_liveness_block, &state);<br>
> +<br>
> + /* We need to propagate the liveness back through the CFG. Thanks to<br>
> + * the wonders of SSA, this will run no more times than the depth of the<br>
> + * deepest loop + 1.<br>
> + */<br>
> + do {<br>
> + state.progress = 0;<br>
<br>
</div></div>state.progress = false;<br>
<div><div class="h5"><br>
> + nir_foreach_block_reverse(impl, walk_instructions_block, &state);<br>
> + } while (state.progress);<br>
> +}<br>
> diff --git a/src/glsl/nir/nir_metadata.c b/src/glsl/nir/nir_metadata.c<br>
> index 070cb74..a4d618c 100644<br>
> --- a/src/glsl/nir/nir_metadata.c<br>
> +++ b/src/glsl/nir/nir_metadata.c<br>
> @@ -39,6 +39,8 @@ nir_metadata_require(nir_function_impl *impl, nir_metadata required)<br>
> nir_index_blocks(impl);<br>
> if (NEEDS_UPDATE(nir_metadata_dominance))<br>
> nir_calc_dominance_impl(impl);<br>
> + if (NEEDS_UPDATE(nir_metadata_live_variables))<br>
> + nir_live_variables_impl(impl);<br>
><br>
> #undef NEEDS_UPDATE<br>
><br>
> diff --git a/src/mesa/main/bitset.h b/src/mesa/main/bitset.h<br>
> index 601fd0e..2558da4 100644<br>
> --- a/src/mesa/main/bitset.h<br>
> +++ b/src/mesa/main/bitset.h<br>
> @@ -32,6 +32,7 @@<br>
> #define BITSET_H<br>
><br>
> #include "imports.h"<br>
> +#include "macros.h"<br>
><br>
> /****************************************************************************<br>
> * generic bitset implementation<br>
> --<br>
> 2.2.0<br>
><br>
</div></div>> _______________________________________________<br>
> mesa-dev mailing list<br>
> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> <a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev" target="_blank">http://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
</blockquote></div></div></div>