[Mesa-dev] [PATCH 02/13] nir: limit copy propagation inside loops
Connor Abbott
cwabbott0 at gmail.com
Tue Aug 30 01:13:44 UTC 2016
On Mon, Aug 29, 2016 at 2:30 AM, Timothy Arceri
<timothy.arceri at collabora.com> wrote:
> On Mon, 2016-08-29 at 01:30 -0400, Connor Abbott wrote:
>> On Mon, Aug 29, 2016 at 12:54 AM, Timothy Arceri
>> <timothy.arceri at collabora.com> wrote:
>> >
>> > Don't do copy propagation inside loops until after we try
>> > unrolling them.
>> >
>> > This helps avoid propagating everything to the phis which
>> > makes loop unrolling more difficult.
>> >
>> > For example without this:
>> >
>> > loop {
>> > block block_1:
>> > /* preds: block_0 block_4 */
>> > vec1 32 ssa_10 = phi block_0: ssa_5, block_4: ssa_15
>> > vec1 32 ssa_11 = phi block_0: ssa_6, block_4: ssa_17
>> > vec1 32 ssa_12 = phi block_0: ssa_7, block_4: ssa_18
>> > vec1 32 ssa_13 = phi block_0: ssa_8, block_4: ssa_19
>> > vec1 32 ssa_14 = phi block_0: ssa_9, block_4: ssa_20
>> > vec1 32 ssa_15 = iadd ssa_10, ssa_2
>> > vec1 32 ssa_16 = ige ssa_15, ssa_1
>> > /* succs: block_2 block_3 */
>> > if ssa_16 {
>> > block block_2:
>> > /* preds: block_1 */
>> > break
>> > /* succs: block_5 */
>> > } else {
>> > block block_3:
>> > /* preds: block_1 */
>> > /* succs: block_4 */
>> > }
>> > block block_4:
>> > /* preds: block_3 */
>> > vec1 32 ssa_17 = imov ssa_12
>> > vec1 32 ssa_18 = imov ssa_13
>> > vec1 32 ssa_19 = imov ssa_14
>> > vec1 32 ssa_20 = imov ssa_11
>> > /* succs: block_1 */
>> > }
>> >
>> > Will end up as:
>> >
>> > loop {
>> > /* preds: block_0 block_4 */
>> > block block_1:
>> > vec1 32 ssa_10 = phi block_0: ssa_5, block_4: ssa_15
>> > vec1 32 ssa_11 = phi block_0: ssa_6, block_4: ssa_12
>> > vec1 32 ssa_12 = phi block_0: ssa_7, block_4: ssa_13
>> > vec1 32 ssa_13 = phi block_0: ssa_8, block_4: ssa_14
>> > vec1 32 ssa_14 = phi block_0: ssa_9, block_4: ssa_11
>> > vec1 32 ssa_15 = iadd ssa_10, ssa_2
>> > vec1 32 ssa_16 = ige ssa_15, ssa_1
>> > /* succs: block_2 block_3 */
>> > if ssa_16 {
>> > block block_2:
>> > /* preds: block_1 */
>> > break
>> > /* succs: block_5 */
>> > } else {
>> > block block_3:
>> > /* preds: block_1 */
>> > /* succs: block_4 */
>> > }
>> > block block_4:
>> > /* preds: block_3 */
>> > /* succs: block_1 */
>> > }
>>
>> This change seems really fishy to me, since moves like those in your
>> example are just a trivial renaming of values.
>
> If you look closely this is not just a trivial renaming of values. This
> example was from a loop that looked something like this:
>
> int i = 0;
> while (i < 2) {
> a = a.yzwx;
> }
>
Yes, it is a trivial renaming of values. What's not trivial is that in
SSA, you can have a loop like this:
orig_a = ...
orig_b = ...
loop {
a = phi(b, orig_a);
b = phi(a, orig_b);
...
}
in which case a and b are swapped each iteration. In other words, phi
node copies are defined to happen in parallel. But this is just a fact
of life when dealing with phi nodes, and I sent a comment to patch 10
that explains how you can change your pass to handle it.
>> Just turning off copy
>> propagation isn't a very robust solution (what if whatever's
>> producing
>> the NIR doesn't insert the moves in the first place?),
>
> I haven't seen this happen when testing shader-db or any of the test
> suites.
The point isn't that anything currently does this, it's that your pass
is depending on other details that it shouldn't be relying on.
>
>> and it can hurt
>> other optimization passes if they can't see through moves.
>
> While true loops in glsl tend to be fairly simple. I haven't been able
> to detect any negative results in my copy of shader-db. Also we should
> be unrolling a high percentage of loops we come across which means copy
> propagation will kick in once they are unrolled.
>
>> Why is the
>> second loop harder to unroll?
>
> Because manipulating the control flow in NIR is painful. When the movs
> remain in place we can simply clone them as we unroll.
You don't need to manipulate the control flow to handle it.
>
>> Why can't it be handled by inserting
>> moves before unrolling the loop, or directly during the loop
>> unrolling
>> pass?
>
> We could possibly do this and I did attempt it at first, but it would
> mean an extra pass over all phi's in the loop header that would likely
> be expensive and complicated. So instead I went with the simple
> solution that had no measurable impact as far as I could tell.
Actually, the pass wouldn't be that complicated. But it shouldn't be
necessary anways (see above).
>
>>
>> >
>> > ---
>> > src/compiler/nir/nir.h | 2 +-
>> > src/compiler/nir/nir_opt_copy_propagate.c | 47
>> > ++++++++++++++++++++-----------
>> > src/mesa/drivers/dri/i965/brw_nir.c | 6 ++--
>> > 3 files changed, 35 insertions(+), 20 deletions(-)
>> >
>> > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> > index d0dfb0d..7ff5394 100644
>> > --- a/src/compiler/nir/nir.h
>> > +++ b/src/compiler/nir/nir.h
>> > @@ -2573,7 +2573,7 @@ bool nir_opt_constant_folding(nir_shader
>> > *shader);
>> >
>> > bool nir_opt_global_to_local(nir_shader *shader);
>> >
>> > -bool nir_copy_prop(nir_shader *shader);
>> > +bool nir_copy_prop(nir_shader *shader, bool prop_loops);
>> >
>> > bool nir_opt_cse(nir_shader *shader);
>> >
>> > diff --git a/src/compiler/nir/nir_opt_copy_propagate.c
>> > b/src/compiler/nir/nir_opt_copy_propagate.c
>> > index c26e07f..12daeb6 100644
>> > --- a/src/compiler/nir/nir_opt_copy_propagate.c
>> > +++ b/src/compiler/nir/nir_opt_copy_propagate.c
>> > @@ -99,11 +99,14 @@ is_swizzleless_move(nir_alu_instr *instr)
>> > }
>> >
>> > static bool
>> > -copy_prop_src(nir_src *src, nir_instr *parent_instr, nir_if
>> > *parent_if)
>> > +copy_prop_src(nir_src *src, nir_instr *parent_instr, nir_if
>> > *parent_if,
>> > + bool prop_loops)
>> > {
>> > if (!src->is_ssa) {
>> > - if (src->reg.indirect)
>> > - return copy_prop_src(src->reg.indirect, parent_instr,
>> > parent_if);
>> > + if (src->reg.indirect) {
>> > + return copy_prop_src(src->reg.indirect, parent_instr,
>> > parent_if,
>> > + prop_loops);
>> > + }
>> > return false;
>> > }
>> >
>> > @@ -125,6 +128,14 @@ copy_prop_src(nir_src *src, nir_instr
>> > *parent_instr, nir_if *parent_if)
>> > if (phi->dest.ssa.num_components !=
>> > alu_instr->src[0].src.ssa->num_components)
>> > return false;
>> > +
>> > + /* Avoid propagating moves inside a loop into phis which
>> > makes
>> > + * unrolling difficult.
>> > + */
>> > + if (!prop_loops) {
>> > + if (phi->instr.block->cf_node.parent->type ==
>> > nir_cf_node_loop)
>> > + return false;
>> > + }
>> > }
>> >
>> > if (parent_instr) {
>> > @@ -140,13 +151,14 @@ copy_prop_src(nir_src *src, nir_instr
>> > *parent_instr, nir_if *parent_if)
>> > }
>> >
>> > static bool
>> > -copy_prop_alu_src(nir_alu_instr *parent_alu_instr, unsigned index)
>> > +copy_prop_alu_src(nir_alu_instr *parent_alu_instr, unsigned index,
>> > + bool prop_loops)
>> > {
>> > nir_alu_src *src = &parent_alu_instr->src[index];
>> > if (!src->src.is_ssa) {
>> > if (src->src.reg.indirect)
>> > return copy_prop_src(src->src.reg.indirect,
>> > &parent_alu_instr->instr,
>> > - NULL);
>> > + NULL, prop_loops);
>> > return false;
>> > }
>> >
>> > @@ -195,6 +207,7 @@ copy_prop_alu_src(nir_alu_instr
>> > *parent_alu_instr, unsigned index)
>> >
>> > typedef struct {
>> > nir_instr *parent_instr;
>> > + bool prop_loops;
>> > bool progress;
>> > } copy_prop_state;
>> >
>> > @@ -202,25 +215,26 @@ static bool
>> > copy_prop_src_cb(nir_src *src, void *_state)
>> > {
>> > copy_prop_state *state = (copy_prop_state *) _state;
>> > - while (copy_prop_src(src, state->parent_instr, NULL))
>> > + while (copy_prop_src(src, state->parent_instr, NULL, state-
>> > >prop_loops))
>> > state->progress = true;
>> >
>> > return true;
>> > }
>> >
>> > static bool
>> > -copy_prop_instr(nir_instr *instr)
>> > +copy_prop_instr(nir_instr *instr, bool prop_loops)
>> > {
>> > if (instr->type == nir_instr_type_alu) {
>> > nir_alu_instr *alu_instr = nir_instr_as_alu(instr);
>> > bool progress = false;
>> >
>> > for (unsigned i = 0; i < nir_op_infos[alu_instr-
>> > >op].num_inputs; i++)
>> > - while (copy_prop_alu_src(alu_instr, i))
>> > + while (copy_prop_alu_src(alu_instr, i, prop_loops))
>> > progress = true;
>> >
>> > if (!alu_instr->dest.dest.is_ssa && alu_instr-
>> > >dest.dest.reg.indirect)
>> > - while (copy_prop_src(alu_instr->dest.dest.reg.indirect,
>> > instr, NULL))
>> > + while (copy_prop_src(alu_instr->dest.dest.reg.indirect,
>> > instr, NULL,
>> > + prop_loops))
>> > progress = true;
>> >
>> > return progress;
>> > @@ -229,30 +243,31 @@ copy_prop_instr(nir_instr *instr)
>> > copy_prop_state state;
>> > state.parent_instr = instr;
>> > state.progress = false;
>> > + state.prop_loops = prop_loops;
>> > nir_foreach_src(instr, copy_prop_src_cb, &state);
>> >
>> > return state.progress;
>> > }
>> >
>> > static bool
>> > -copy_prop_if(nir_if *if_stmt)
>> > +copy_prop_if(nir_if *if_stmt, bool prop_loops)
>> > {
>> > - return copy_prop_src(&if_stmt->condition, NULL, if_stmt);
>> > + return copy_prop_src(&if_stmt->condition, NULL, if_stmt,
>> > prop_loops);
>> > }
>> >
>> > static bool
>> > -nir_copy_prop_impl(nir_function_impl *impl)
>> > +nir_copy_prop_impl(nir_function_impl *impl, bool prop_loops)
>> > {
>> > bool progress = false;
>> >
>> > nir_foreach_block(block, impl) {
>> > nir_foreach_instr(instr, block) {
>> > - if (copy_prop_instr(instr))
>> > + if (copy_prop_instr(instr, prop_loops))
>> > progress = true;
>> > }
>> >
>> > nir_if *if_stmt = nir_block_get_following_if(block);
>> > - if (if_stmt && copy_prop_if(if_stmt))
>> > + if (if_stmt && copy_prop_if(if_stmt, prop_loops))
>> > progress = true;
>> > }
>> >
>> > @@ -265,12 +280,12 @@ nir_copy_prop_impl(nir_function_impl *impl)
>> > }
>> >
>> > bool
>> > -nir_copy_prop(nir_shader *shader)
>> > +nir_copy_prop(nir_shader *shader, bool prop_loops)
>> > {
>> > bool progress = false;
>> >
>> > nir_foreach_function(function, shader) {
>> > - if (function->impl && nir_copy_prop_impl(function->impl))
>> > + if (function->impl && nir_copy_prop_impl(function->impl,
>> > prop_loops))
>> > progress = true;
>> > }
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c
>> > b/src/mesa/drivers/dri/i965/brw_nir.c
>> > index 29a3e3e..b8457d2 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_nir.c
>> > +++ b/src/mesa/drivers/dri/i965/brw_nir.c
>> > @@ -378,13 +378,13 @@ nir_optimize(nir_shader *nir, bool is_scalar)
>> > OPT_V(nir_lower_alu_to_scalar);
>> > }
>> >
>> > - OPT(nir_copy_prop);
>> > + OPT(nir_copy_prop, false);
>> >
>> > if (is_scalar) {
>> > OPT_V(nir_lower_phis_to_scalar);
>> > }
>> >
>> > - OPT(nir_copy_prop);
>> > + OPT(nir_copy_prop, false);
>> > OPT(nir_opt_dce);
>> > OPT(nir_opt_cse);
>> > OPT(nir_opt_peephole_select);
>> > @@ -499,7 +499,7 @@ brw_postprocess_nir(nir_shader *nir,
>> > OPT(nir_lower_locals_to_regs);
>> >
>> > OPT_V(nir_lower_to_source_mods);
>> > - OPT(nir_copy_prop);
>> > + OPT(nir_copy_prop, true);
>> > OPT(nir_opt_dce);
>> >
>> > if (unlikely(debug_enabled)) {
>> > --
>> > 2.7.4
>> >
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev at lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list