[Mesa-dev] [PATCH 4/4] nir: detect more induction variables

Fri Nov 30 06:38:18 UTC 2018

Den ons. 28. nov. 2018 kl. 10:23 skrev Timothy Arceri <tarceri at itsqueeze.com>:
>
> On 28/11/18 6:52 pm, Thomas Helland wrote:
> > Den ons. 28. nov. 2018 kl. 04:26 skrev Timothy Arceri <tarceri at itsqueeze.com>:
> >>
> >> This adds allows loop analysis to detect inductions varibales that
> >> are incremented in both branches of an if rather than in a main
> >> loop block. For example:
> >>
> >>     loop {
> >>        block block_1:
> >>        /* preds: block_0 block_7 */
> >>        vec1 32 ssa_8 = phi block_0: ssa_4, block_7: ssa_20
> >>        vec1 32 ssa_9 = phi block_0: ssa_0, block_7: ssa_4
> >>        vec1 32 ssa_10 = phi block_0: ssa_1, block_7: ssa_4
> >>        vec1 32 ssa_11 = phi block_0: ssa_2, block_7: ssa_21
> >>        vec1 32 ssa_12 = phi block_0: ssa_3, block_7: ssa_22
> >>        vec4 32 ssa_13 = vec4 ssa_12, ssa_11, ssa_10, ssa_9
> >>        vec1 32 ssa_14 = ige ssa_8, ssa_5
> >>        /* succs: block_2 block_3 */
> >>        if ssa_14 {
> >>           block block_2:
> >>           /* preds: block_1 */
> >>           break
> >>           /* succs: block_8 */
> >>        } else {
> >>           block block_3:
> >>           /* preds: block_1 */
> >>           /* succs: block_4 */
> >>        }
> >>        block block_4:
> >>        /* preds: block_3 */
> >>        vec1 32 ssa_15 = ilt ssa_6, ssa_8
> >>        /* succs: block_5 block_6 */
> >>        if ssa_15 {
> >>           block block_5:
> >>           /* preds: block_4 */
> >>           vec1 32 ssa_16 = iadd ssa_8, ssa_7
> >>           vec1 32 ssa_17 = load_const (0x3f800000 /* 1.000000*/)
> >>           /* succs: block_7 */
> >>        } else {
> >>           block block_6:
> >>           /* preds: block_4 */
> >>           vec1 32 ssa_18 = iadd ssa_8, ssa_7
> >>           vec1 32 ssa_19 = load_const (0x3f800000 /* 1.000000*/)
> >>           /* succs: block_7 */
> >>        }
> >>        block block_7:
> >>        /* preds: block_5 block_6 */
> >>        vec1 32 ssa_20 = phi block_5: ssa_16, block_6: ssa_18
> >>        vec1 32 ssa_21 = phi block_5: ssa_17, block_6: ssa_4
> >>        vec1 32 ssa_22 = phi block_5: ssa_4, block_6: ssa_19
> >>        /* succs: block_1 */
> >>     }
> >>
> >> Unfortunatly GCM could move the addition out of the if for us
> >> (making this patch unrequired) but we still cannot enable the GCM
> >> pass without regressions.
> >>
> >
> > Just some questions / suggestions from my side for now.
> > I'll try to take a closer look at the patch later today.
> >
> > While GCM would be nice, to me it seems that adding an
> > if-opt instead, that pulls common code from both branches
> > of an if out of the if on a more general basis, would get us
> > this, plus a bunch of other benefits? As far as I can see there
> > should never be negative impacts from pulling common code
> > out like that, but I might be wrong. Did you look into that?
> > I bet out did, I'm just interested in how that worked out.
>
> I didn't attempt this because pulling code out of the ifs can increase
> register pressure. This is one of the problems we have with the GCM pass
> currently, so for now I chose a more conservative approach.
>

Yeah, of course. I'm being dumb. It looks better in source code,
but as long as it does not lead to other optimizations it will only
cause the live range of the add to intersect with that of the branch
condition. The same amount of instructions will be executed
either way.

> >
> > Since GCM is not yet where we want it to be, maybe we'd
> > want to implement LICM? That obviously does not come
> > into play with what this patch adresses, but it might help
> > get a more accurate estimate of the cost/benefit of unrolling?
> > (Invariant computations that will be CSE'd will not be
> > counted multiple times). This might already be accounted
> > for by counting the invariant computations only once?
>
> No we don't do anything like this currently. The GCM pass can pull
> things out of loops also, but again we hit register pressure issues with
> that pass.
>
> As far as I'm aware reducing invariants is not where we get most of our
> wins from with unrolling. Removing indirect array access, improving
> opportunities for constant folding (and a bunch of other passes), being
> able to evaluate the unfolded loop with the surrounding code etc all
> result in greater benefits.
>
> With the limits we place on making sure we don't unroll large loops that
> are going to cause register use issues, nobody has yet been able to show
> that always unrolling loops is causing any harm, and it's certainly been
> shown to help :)

Thanks for taking the time with my stupidity =) I'll try to take a look at
these patches later tonight =)