[Mesa-dev] [PATCH v2 11/12] nir: Add a global code motion (GCM) pass
Jason Ekstrand
jason at jlekstrand.net
Mon Feb 9 20:17:21 PST 2015
On Mon, Feb 9, 2015 at 3:31 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Mon, Feb 9, 2015 at 6:11 PM, Jason Ekstrand <jason at jlekstrand.net>
> wrote:
> >
> >
> > On Mon, Feb 9, 2015 at 12:01 PM, Connor Abbott <cwabbott0 at gmail.com>
> wrote:
> >>
> >> So last time you posted this, I had some suggestions but I realized
> >> that in light of the fact that we want to do value numbering, it
> >> didn't seem like my suggestions were any good. But now that I've
> >> thought about it a little bit, it seems to me like a better plan might
> >> be to pull instructions out of their basic blocks and into the global
> >> list of instructions earlier. That is, instead of marking instructions
> >> as pinned in gcm_pin_instructions_block(), put the ones that aren't
> >> pinned in state->instrs and then instead of walking over all the basic
> >> blocks and bailing on pinned instructions when scheduling early and
> >> late, just walk over state->instrs. This should be faster, and when we
> >> do value-numbering there's a natural place to do it, i.e. after
> >> pulling instructions but before scheduling -- this means we only
> >> value-number the non-pinned instructions, which is what we want since
> >> we're not coalescing the pinned instructions in this pass. This is
> >> also more similar to how it's presented in the paper (although our
> >> scheduling algorithm will still be different I guess).
> >
> >
> > Sorry for not addressing this comment. A couple of things:
> >
> > 1) We need the "is this instruction pinned" metadata for other things
> than
> > just not iterating over them so we need to stash it somehow anyway.
>
> Sure.
>
> >
> > 2) I think we want value numbering to be its own pass. We just won't run
> > the validator in between that and GCM. These passes are tricky and
> complex
> > enough that I don't want to make it even more confusing by saying "Oh,
> now
> > we value-number". Obviously, GVN and GCM will have to be kept in sync
> so we
> > don't combine two things that can't be code-motioned but that's not a big
> > deal.
>
> The thing is, GVN and GCM are very closely related -- not only can you
> not do GVN without GCM afterwards, but as you mentioned you need to
> know which instructions are pinned in order to not combine things that
> can't be combined. So I think calling GVN as a helper/subroutine as a
> part of GCM ("hey, take this list of instructions and merge any
> duplicates") might actually be better than keeping them as two
> kind-of-separate-but-not-really passes that have to be run in a
> specific order and share information -- they're actually *more*
> self-contained with the former approach than the latter, since all GVN
> needs to do is merge the instructions in the list it's given and we
> don't need to run the passes in a specific order without validating
> them in-between.
>
Hm... I do kind of like the whole "Here's a pile of instructions to
value-number" idea. It wouldn't be hard to put that in between the initial
pinning and schedule_early.
> >
> > As far as pulling them as we pin and then only ever walking over
> non-pinned
> > instructions. I thought about it.. Then I didn't for some historical
> > reason but I don't remember what that was anymore. We could go back and
> do
> > that and it would be a bit fasater. I can do that if you'd like but I
> don't
> > think it'll matter much.
>
> True, it'll probably only be worth it if we do what I mentioned above.
>
> >
> >>
> >>
> >> On Mon, Feb 9, 2015 at 3:19 AM, Jason Ekstrand <jason at jlekstrand.net>
> >> wrote:
> >> > v2 Jason Ekstrand <jason.ekstrand at intel.com>:
> >> > - Use nir_dominance_lca for computing least common anscestors
> >> > - Use the block index for comparing dominance tree depths
> >> > - Pin things that do partial derivatives
> >> > ---
> >> > src/glsl/Makefile.sources | 1 +
> >> > src/glsl/nir/nir.h | 2 +
> >> > src/glsl/nir/nir_opt_gcm.c | 501
> >> > +++++++++++++++++++++++++++++++++++++++++++++
> >> > 3 files changed, 504 insertions(+)
> >> > create mode 100644 src/glsl/nir/nir_opt_gcm.c
> >> >
> >> > diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
> >> > index a580b6e..69cb2e6 100644
> >> > --- a/src/glsl/Makefile.sources
> >> > +++ b/src/glsl/Makefile.sources
> >> > @@ -43,6 +43,7 @@ NIR_FILES = \
> >> > nir/nir_opt_copy_propagate.c \
> >> > nir/nir_opt_cse.c \
> >> > nir/nir_opt_dce.c \
> >> > + nir/nir_opt_gcm.c \
> >> > nir/nir_opt_global_to_local.c \
> >> > nir/nir_opt_peephole_select.c \
> >> > nir/nir_opt_remove_phis.c \
> >> > diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> >> > index 90a7001..55fb43d 100644
> >> > --- a/src/glsl/nir/nir.h
> >> > +++ b/src/glsl/nir/nir.h
> >> > @@ -1589,6 +1589,8 @@ bool nir_opt_cse(nir_shader *shader);
> >> > bool nir_opt_dce_impl(nir_function_impl *impl);
> >> > bool nir_opt_dce(nir_shader *shader);
> >> >
> >> > +void nir_opt_gcm(nir_shader *shader);
> >> > +
> >> > bool nir_opt_peephole_select(nir_shader *shader);
> >> > bool nir_opt_peephole_ffma(nir_shader *shader);
> >> >
> >> > diff --git a/src/glsl/nir/nir_opt_gcm.c b/src/glsl/nir/nir_opt_gcm.c
> >> > new file mode 100644
> >> > index 0000000..d48518b
> >> > --- /dev/null
> >> > +++ b/src/glsl/nir/nir_opt_gcm.c
> >> > @@ -0,0 +1,501 @@
> >> > +/*
> >> > + * Copyright © 2014 Intel Corporation
> >> > + *
> >> > + * Permission is hereby granted, free of charge, to any person
> >> > obtaining a
> >> > + * copy of this software and associated documentation files (the
> >> > "Software"),
> >> > + * to deal in the Software without restriction, including without
> >> > limitation
> >> > + * the rights to use, copy, modify, merge, publish, distribute,
> >> > sublicense,
> >> > + * and/or sell copies of the Software, and to permit persons to whom
> >> > the
> >> > + * Software is furnished to do so, subject to the following
> conditions:
> >> > + *
> >> > + * The above copyright notice and this permission notice (including
> the
> >> > next
> >> > + * paragraph) shall be included in all copies or substantial portions
> >> > of the
> >> > + * Software.
> >> > + *
> >> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> >> > EXPRESS OR
> >> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> >> > MERCHANTABILITY,
> >> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
> >> > SHALL
> >> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
> OR
> >> > OTHER
> >> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> >> > ARISING
> >> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> OTHER
> >> > DEALINGS
> >> > + * IN THE SOFTWARE.
> >> > + *
> >> > + * Authors:
> >> > + * Jason Ekstrand (jason at jlekstrand.net)
> >> > + *
> >> > + */
> >> > +
> >> > +#include "nir.h"
> >> > +
> >> > +/*
> >> > + * Implements Global Code Motion. A description of GCM can be found
> in
> >> > + * "Global Code Motion; Global Value Numbering" by Cliff Click.
> >> > + * Unfortunately, the algorithm presented in the paper is broken in a
> >> > + * number of ways. The algorithm used here differs substantially
> from
> >> > the
> >> > + * one in the paper but it is, in my opinion, much easier to read and
> >> > + * verify correcness.
> >> > + */
> >> > +
> >> > +struct gcm_block_info {
> >> > + /* Number of loops this block is inside */
> >> > + unsigned loop_depth;
> >> > +
> >> > + /* The last instruction inserted into this block. This is used as
> >> > we
> >> > + * traverse the instructions and insert them back into the program
> >> > to
> >> > + * put them in the right order.
> >> > + */
> >> > + nir_instr *last_instr;
> >> > +};
> >> > +
> >> > +struct gcm_state {
> >> > + nir_function_impl *impl;
> >> > + nir_instr *instr;
> >> > +
> >> > + /* Marks all instructions that have been visited by the curren
> pass
> >> > */
> >> > + BITSET_WORD *visited;
> >> > +
> >> > + /* Marks instructions that are "pinned", i.e. cannot be moved from
> >> > their
> >> > + * basic block by code motion */
> >> > + BITSET_WORD *pinned;
> >>
> >> Note that there's already a boolean field embedded in each instruction
> >> called "live" since it's used by DCE. Maybe we can replace it with a
> >> uint8_t or so with a more generic name like "data" or "bitfield", and
> >> then we can use it for this pass as well to replace both visited and
> >> pinned. We'd only need 4 bits (pinned, scheduled early, scheduled
> >> late, and placed). This would also mean we wouldn't need the
> >> per-instruction index any more.
> >>
> >> > +
> >> > + /* The list of non-pinned instructions. As we do the late
> >> > scheduling,
> >> > + * we pull non-pinned instructions out of their blocks and place
> >> > them in
> >> > + * this list. This saves us from having linked-list problems when
> >> > we go
> >> > + * to put instructions back in their blocks.
> >> > + */
> >> > + struct exec_list instrs;
> >> > +
> >> > + struct gcm_block_info *blocks;
> >> > +};
> >> > +
> >> > +/* Recursively walks the CFG and builds the block_info structure */
> >> > +static void
> >> > +gcm_build_block_info(struct exec_list *cf_list, struct gcm_state
> >> > *state,
> >> > + unsigned loop_depth)
> >> > +{
> >> > + foreach_list_typed(nir_cf_node, node, node, cf_list) {
> >> > + switch (node->type) {
> >> > + case nir_cf_node_block: {
> >> > + nir_block *block = nir_cf_node_as_block(node);
> >> > + state->blocks[block->index].loop_depth = loop_depth;
> >> > + break;
> >> > + }
> >> > + case nir_cf_node_if: {
> >> > + nir_if *if_stmt = nir_cf_node_as_if(node);
> >> > + gcm_build_block_info(&if_stmt->then_list, state,
> loop_depth);
> >> > + gcm_build_block_info(&if_stmt->else_list, state,
> loop_depth);
> >> > + break;
> >> > + }
> >> > + case nir_cf_node_loop: {
> >> > + nir_loop *loop = nir_cf_node_as_loop(node);
> >> > + gcm_build_block_info(&loop->body, state, loop_depth + 1);
> >> > + break;
> >> > + }
> >> > + default:
> >> > + unreachable("Invalid CF node type");
> >> > + }
> >> > + }
> >> > +}
> >> > +
> >> > +/* Walks the instruction list and marks immovable instructions as
> >> > pinned */
> >> > +static bool
> >> > +gcm_pin_instructions_block(nir_block *block, void *void_state)
> >> > +{
> >> > + struct gcm_state *state = void_state;
> >> > +
> >> > + nir_foreach_instr_safe(block, instr) {
> >> > + bool pinned;
> >> > + switch (instr->type) {
> >> > + case nir_instr_type_alu:
> >> > + switch (nir_instr_as_alu(instr)->op) {
> >> > + case nir_op_fddx:
> >> > + case nir_op_fddy:
> >> > + case nir_op_fddx_fine:
> >> > + case nir_op_fddy_fine:
> >> > + case nir_op_fddx_coarse:
> >> > + case nir_op_fddy_coarse:
> >> > + /* These can only go in uniform control flow; pin them
> for
> >> > now */
> >> > + pinned = true;
> >> > +
> >> > + default:
> >> > + pinned = false;
> >> > + }
> >> > + break;
> >> > +
> >> > + case nir_instr_type_tex:
> >> > + /* We need to pin texture ops that do partial derivatives */
> >> > + pinned = nir_instr_as_tex(instr)->op == nir_texop_txd;
> >>
> >> I don't think this is correct -- txd is when we supply the derivatives
> >> ourselves, so there are no implicit partial derivatives. I *think* the
> >> ones where the HW takes partial derivatives are tex, txb, lod, and tg4
> >> (gather). I'd double-check that with someone else, though, since it's
> >> been a while since I had to do texture stuff.
> >>
> >> > + break;
> >> > +
> >> > + case nir_instr_type_load_const:
> >> > + pinned = false;
> >> > + break;
> >> > +
> >> > + case nir_instr_type_intrinsic: {
> >> > + const nir_intrinsic_info *info =
> >> > +
> >> > &nir_intrinsic_infos[nir_instr_as_intrinsic(instr)->intrinsic];
> >> > + pinned = !(info->flags & NIR_INTRINSIC_CAN_ELIMINATE) ||
> >> > + !(info->flags & NIR_INTRINSIC_CAN_REORDER);
> >> > + break;
> >> > + }
> >> > +
> >> > + case nir_instr_type_jump:
> >> > + case nir_instr_type_ssa_undef:
> >> > + case nir_instr_type_phi:
> >> > + pinned = true;
> >> > + break;
> >> > +
> >> > + default:
> >> > + unreachable("Invalid instruction type in GCM");
> >> > + }
> >> > +
> >> > + if (pinned)
> >> > + BITSET_SET(state->pinned, instr->index);
> >> > + }
> >> > +
> >> > + return true;
> >> > +}
> >> > +
> >> > +static void
> >> > +gcm_schedule_early_instr(nir_instr *instr, struct gcm_state *state);
> >> > +
> >> > +/** Update an instructions schedule for the given source
> >> > + *
> >> > + * This function is called iteratively as we walk the sources of an
> >> > + * instruction. It ensures that the given source instruction has
> been
> >> > + * scheduled and then update this instruction's block if the source
> >> > + * instruction is lower down the tree.
> >> > + */
> >> > +static bool
> >> > +gcm_schedule_early_src(nir_src *src, void *void_state)
> >> > +{
> >> > + struct gcm_state *state = void_state;
> >> > + nir_instr *instr = state->instr;
> >> > +
> >> > + assert(src->is_ssa);
> >> > +
> >> > + gcm_schedule_early_instr(src->ssa->parent_instr, void_state);
> >> > +
> >> > + /* While the index isn't a proper dominance depth, it does have
> the
> >> > + * property that if A dominates B then A->index <= B->index.
> Since
> >> > we
> >> > + * know that this instruction must have been dominated by all of
> its
> >> > + * sources at some point (even if it's gone through
> >> > value-numbering),
> >> > + * all of the sources must lie on the same branch of the dominance
> >> > tree.
> >> > + * Therefore, we can just go ahead and just compare indices.
> >> > + */
> >> > + if (instr->block->index < src->ssa->parent_instr->block->index)
> >> > + instr->block = src->ssa->parent_instr->block;
> >> > +
> >> > + /* We need to restore the state instruction because it may have
> been
> >> > + * changed through the gcm_schedule_early_instr call above. Since
> >> > we
> >> > + * may still be iterating through sources and future calls to
> >> > + * gcm_schedule_early_src for the same instruction will still need
> >> > it.
> >> > + */
> >> > + state->instr = instr;
> >> > +
> >> > + return true;
> >> > +}
> >> > +
> >> > +/** Schedules an instruction early
> >> > + *
> >> > + * This function performs a recursive depth-first search starting at
> >> > the
> >> > + * given instruction and proceeding through the sources to schedule
> >> > + * instructions as early as they can possibly go in the dominance
> tree.
> >> > + * The instructions are "scheduled" by updating their instr->block
> >> > field.
> >> > + */
> >> > +static void
> >> > +gcm_schedule_early_instr(nir_instr *instr, struct gcm_state *state)
> >> > +{
> >> > + if (BITSET_TEST(state->visited, instr->index))
> >> > + return;
> >> > +
> >> > + BITSET_SET(state->visited, instr->index);
> >> > +
> >> > + /* Pinned instructions are already scheduled so we don't need to
> do
> >> > + * anything. Also, bailing here keeps us from ever following the
> >> > + * sources of phi nodes which can be back-edges.
> >> > + */
> >> > + if (BITSET_TEST(state->pinned, instr->index))
> >> > + return;
> >> > +
> >> > + /* Start with the instruction at the top. As we iterate over the
> >> > + * sources, it will get moved down as needed.
> >> > + */
> >> > + instr->block = state->impl->start_block;
> >> > + state->instr = instr;
> >> > +
> >> > + nir_foreach_src(instr, gcm_schedule_early_src, state);
> >> > +}
> >> > +
> >> > +static bool
> >> > +gcm_schedule_early_block(nir_block *block, void *state)
> >> > +{
> >> > + nir_foreach_instr(block, instr)
> >> > + gcm_schedule_early_instr(instr, state);
> >> > +
> >> > + return true;
> >> > +}
> >> > +
> >> > +static void
> >> > +gcm_schedule_late_instr(nir_instr *instr, struct gcm_state *state);
> >> > +
> >> > +/** Schedules the instruction associated with the given SSA def late
> >> > + *
> >> > + * This function works by first walking all of the uses of the given
> >> > SSA
> >> > + * definition, ensuring that they are scheduled, and then computing
> the
> >> > LCA
> >> > + * (least common ancestor) of its uses. It then schedules this
> >> > instruction
> >> > + * as close to the LCA as possible while trying to stay out of loops.
> >> > + */
> >> > +static bool
> >> > +gcm_schedule_late_def(nir_ssa_def *def, void *void_state)
> >> > +{
> >> > + struct gcm_state *state = void_state;
> >> > +
> >> > + nir_block *lca = NULL;
> >> > +
> >> > + struct set_entry *entry;
> >> > + set_foreach(def->uses, entry) {
> >> > + nir_instr *use_instr = (nir_instr *)entry->key;
> >> > +
> >> > + gcm_schedule_late_instr(use_instr, state);
> >> > +
> >> > + /* Phi instructions are a bit special. SSA definitions don't
> >> > have to
> >> > + * dominate the sources of the phi nodes that use them;
> instead,
> >> > they
> >> > + * have to dominate the predecessor block corresponding to the
> >> > phi
> >> > + * source. We handle this by looking through the sources,
> >> > finding
> >> > + * any that are usingg this SSA def, and using those blocks
> >> > instead
> >> > + * of the one the phi lives in.
> >> > + */
> >> > + if (use_instr->type == nir_instr_type_phi) {
> >> > + nir_phi_instr *phi = nir_instr_as_phi(use_instr);
> >> > +
> >> > + nir_foreach_phi_src(phi, phi_src) {
> >> > + if (phi_src->src.ssa == def)
> >> > + lca = nir_dominance_lca(lca, phi_src->pred);
> >> > + }
> >> > + } else {
> >> > + lca = nir_dominance_lca(lca, use_instr->block);
> >> > + }
> >> > + }
> >> > +
> >> > + set_foreach(def->if_uses, entry) {
> >> > + nir_if *if_stmt = (nir_if *)entry->key;
> >> > +
> >> > + /* For if statements, we consider the block to be the one
> >> > immediately
> >> > + * preceding the if CF node.
> >> > + */
> >> > + nir_block *pred_block =
> >> > + nir_cf_node_as_block(nir_cf_node_prev(&if_stmt->cf_node));
> >> > +
> >> > + lca = nir_dominance_lca(lca, pred_block);
> >> > + }
> >> > +
> >> > + /* Some instructions may never be used. We'll just leave them
> >> > scheduled
> >> > + * early and let dead code clean them up.
> >> > + */
> >> > + if (lca == NULL)
> >> > + return true;
> >> > +
> >> > + /* We know have the LCA of all of the uses. If our invariants
> hold,
> >> > + * this is dominated by the block that we chose when scheduling
> >> > early.
> >> > + * We now walk up the dominance tree and pick the lowest block
> that
> >> > is
> >> > + * as far outside loops as we can get.
> >> > + */
> >> > + nir_block *best = lca;
> >> > + while (lca != def->parent_instr->block) {
> >> > + assert(lca);
> >> > + if (state->blocks[lca->index].loop_depth <
> >> > + state->blocks[best->index].loop_depth)
> >> > + best = lca;
> >> > + lca = lca->imm_dom;
> >> > + }
> >> > + def->parent_instr->block = best;
> >>
> >> I think this loop will exclude def->parent_instr->block from
> >> consideration unless lca == def->parent_instr->block. On the last
> >> iteration, we'll check lca to see if it's the best, then move lca up
> >> the tree, then discover that lca is equal to def->parent_instr->block,
> >> then bail out without checking it. I think you can fix this by
> >> swapping the if-statement and the last statement in the loop, which
> >> will also have the benefit of avoiding the unnecessary test on the
> >> first iteration of the loop. Of course, you'd want to move the assert
> >> as well, so it would look like:
> >>
> >> nir_block *best = lca;
> >> while (lca != def->parent_instr->block) {
> >> lca = lca->imm_dom;
> >> assert(lca);
> >> if (state->blocks[lca->index].loop_depth <
> >> state->blocks[best->index].loop_depth)
> >> best = lca;
> >> }
> >>
> >> > +
> >> > + return true;
> >> > +}
> >> > +
> >> > +/** Schedules an instruction late
> >> > + *
> >> > + * This function performs a depth-first search starting at the given
> >> > + * instruction and proceeding through its uses to schedule
> instructions
> >> > as
> >> > + * late as they can reasonably go in the dominance tree. The
> >> > instructions
> >> > + * are "scheduled" by updating their instr->block field.
> >> > + *
> >> > + * The name of this function is actually a bit of a misnomer as it
> >> > doesn't
> >> > + * schedule them "as late as possible" as the paper implies.
> Instead,
> >> > it
> >> > + * first finds the lates possible place it can schedule the
> instruction
> >> > and
> >> > + * then possibly schedules it earlier than that. The actual location
> >> > is as
> >> > + * far down the tree as we can go while trying to stay out of loops.
> >> > + */
> >> > +static void
> >> > +gcm_schedule_late_instr(nir_instr *instr, struct gcm_state *state)
> >> > +{
> >> > + if (BITSET_TEST(state->visited, instr->index))
> >> > + return;
> >> > +
> >> > + BITSET_SET(state->visited, instr->index);
> >> > +
> >> > + /* Pinned instructions are already scheduled so we don't need to
> do
> >> > + * anything. Also, bailing here keeps us from ever following phi
> >> > nodes
> >> > + * which can be back-edges.
> >> > + */
> >> > + if (BITSET_TEST(state->pinned, instr->index))
> >> > + return;
> >> > +
> >> > + nir_foreach_ssa_def(instr, gcm_schedule_late_def, state);
> >> > +}
> >> > +
> >> > +static bool
> >> > +gcm_schedule_late_block(nir_block *block, void *void_state)
> >> > +{
> >> > + struct gcm_state *state = void_state;
> >> > +
> >> > + nir_foreach_instr_safe(block, instr) {
> >> > + gcm_schedule_late_instr(instr, state);
> >> > +
> >> > + if (!BITSET_TEST(state->pinned, instr->index)) {
> >> > + /* If this is an instruction we can move, go ahead and pull
> it
> >> > out
> >> > + * of the program and put it on the instrs list. This keeps
> >> > us
> >> > + * from causing linked list confusion when we're trying to
> put
> >> > + * everything in its proper place.
> >> > + *
> >> > + * Note that we don't use nir_instr_remove here because that
> >> > also
> >> > + * cleans up uses and defs and we want to keep that
> >> > information.
> >> > + */
> >> > + exec_node_remove(&instr->node);
> >> > + exec_list_push_tail(&state->instrs, &instr->node);
> >> > + }
> >> > + }
> >> > +
> >> > + return true;
> >> > +}
> >> > +
> >> > +static void
> >> > +gcm_place_instr(nir_instr *instr, struct gcm_state *state);
> >> > +
> >> > +static bool
> >> > +gcm_place_instr_def(nir_ssa_def *def, void *state)
> >> > +{
> >> > + struct set_entry *entry;
> >> > + set_foreach(def->uses, entry)
> >> > + gcm_place_instr((nir_instr *)entry->key, state);
> >> > +
> >> > + return false;
> >> > +}
> >> > +
> >> > +/** Places an instrution back into the program
> >> > + *
> >> > + * The earlier passes of GCM simply choose blocks for each
> instruction
> >> > and
> >> > + * otherwise leave them alone. This pass actually places the
> >> > instructions
> >> > + * into their chosen blocks.
> >> > + *
> >> > + * To do so, we use a standard post-order depth-first search
> >> > linearization
> >> > + * algorithm. We walk over the uses of the given instruction and
> >> > ensure
> >> > + * that they are placed and then place this instruction. Because we
> >> > are
> >> > + * working on multiple blocks at a time, we keep track of the last
> >> > inserted
> >> > + * instruction per-block in the state structure's block_info array.
> >> > When
> >> > + * we insert an instruction in a block we insert it before the last
> >> > + * instruction inserted in that block rather than the last
> instruction
> >> > + * inserted globally.
> >> > + */
> >> > +static void
> >> > +gcm_place_instr(nir_instr *instr, struct gcm_state *state)
> >> > +{
> >> > + if (BITSET_TEST(state->visited, instr->index))
> >> > + return;
> >> > +
> >> > + BITSET_SET(state->visited, instr->index);
> >> > +
> >> > + /* Phi nodes are our once source of back-edges. Since right now
> we
> >> > are
> >> > + * only doing scheduling within blocks, we don't need to worry
> about
> >> > + * them since they are always at the top. Just skip them
> >> > completely.
> >> > + */
> >> > + if (instr->type == nir_instr_type_phi) {
> >> > + assert(BITSET_TEST(state->pinned, instr->index));
> >> > + return;
> >> > + }
> >> > +
> >> > + nir_foreach_ssa_def(instr, gcm_place_instr_def, state);
> >> > +
> >> > + if (BITSET_TEST(state->pinned, instr->index)) {
> >> > + /* Pinned instructions have an implicit dependence on the
> pinned
> >> > + * instructions that come after them in the block. Since the
> >> > pinned
> >> > + * instructions will naturally "chain" together, we only need
> to
> >> > + * explicitly visit one of them.
> >> > + */
> >> > + for (nir_instr *after = nir_instr_next(instr);
> >> > + after;
> >> > + after = nir_instr_next(after)) {
> >> > + if (BITSET_TEST(state->pinned, after->index)) {
> >> > + gcm_place_instr(after, state);
> >> > + break;
> >> > + }
> >> > + }
> >> > + }
> >> > +
> >> > + struct gcm_block_info *block_info =
> >> > &state->blocks[instr->block->index];
> >> > + if (!BITSET_TEST(state->pinned, instr->index)) {
> >> > + exec_node_remove(&instr->node);
> >> > +
> >> > + if (block_info->last_instr) {
> >> > + exec_node_insert_node_before(&block_info->last_instr->node,
> >> > + &instr->node);
> >> > + } else {
> >> > + /* Schedule it at the end of the block */
> >> > + nir_instr *jump_instr = nir_block_last_instr(instr->block);
> >> > + if (jump_instr && jump_instr->type == nir_instr_type_jump) {
> >> > + exec_node_insert_node_before(&jump_instr->node,
> >> > &instr->node);
> >> > + } else {
> >> > + exec_list_push_tail(&instr->block->instr_list,
> >> > &instr->node);
> >> > + }
> >> > + }
> >> > + }
> >> > +
> >> > + block_info->last_instr = instr;
> >> > +}
> >> > +
> >> > +static void
> >> > +opt_gcm_impl(nir_function_impl *impl)
> >> > +{
> >> > + struct gcm_state state;
> >> > +
> >> > + unsigned num_instrs = nir_index_instrs(impl);
> >> > + unsigned instr_words = BITSET_WORDS(num_instrs);
> >> > +
> >> > + state.impl = impl;
> >> > + state.instr = NULL;
> >> > + state.visited = rzalloc_array(NULL, BITSET_WORD, instr_words);
> >> > + state.pinned = rzalloc_array(NULL, BITSET_WORD, instr_words);
> >> > + exec_list_make_empty(&state.instrs);
> >> > + state.blocks = rzalloc_array(NULL, struct gcm_block_info,
> >> > impl->num_blocks);
> >> > +
> >> > + nir_metadata_require(impl, nir_metadata_block_index |
> >> > + nir_metadata_dominance);
> >> > +
> >> > + gcm_build_block_info(&impl->body, &state, 0);
> >> > + nir_foreach_block(impl, gcm_pin_instructions_block, &state);
> >> > +
> >> > + nir_foreach_block(impl, gcm_schedule_early_block, &state);
> >> > +
> >> > + memset(state.visited, 0, instr_words * sizeof(*state.visited));
> >> > + nir_foreach_block(impl, gcm_schedule_late_block, &state);
> >> > +
> >> > + memset(state.visited, 0, instr_words * sizeof(*state.visited));
> >> > + while (!exec_list_is_empty(&state.instrs)) {
> >> > + nir_instr *instr = exec_node_data(nir_instr,
> >> > + state.instrs.tail_pred,
> node);
> >> > + gcm_place_instr(instr, &state);
> >> > + }
> >> > +
> >> > + ralloc_free(state.visited);
> >> > + ralloc_free(state.blocks);
> >> > +}
> >> > +
> >> > +void
> >> > +nir_opt_gcm(nir_shader *shader)
> >> > +{
> >> > + nir_foreach_overload(shader, overload) {
> >> > + if (overload->impl)
> >> > + opt_gcm_impl(overload->impl);
> >> > + }
> >> > +}
> >> > --
> >> > 2.2.2
> >> >
> >> > _______________________________________________
> >> > mesa-dev mailing list
> >> > mesa-dev at lists.freedesktop.org
> >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150209/b43951c3/attachment-0001.html>
More information about the mesa-dev
mailing list