[Mesa-dev] [PATCH 4/4] nir: detect more induction variables

Fri Nov 30 22:08:31 UTC 2018

On 30/11/18 5:38 pm, Thomas Helland wrote:
> Den ons. 28. nov. 2018 kl. 10:23 skrev Timothy Arceri <tarceri at itsqueeze.com>:
>>
>> On 28/11/18 6:52 pm, Thomas Helland wrote:
>>> Den ons. 28. nov. 2018 kl. 04:26 skrev Timothy Arceri <tarceri at itsqueeze.com>:
>>>>
>>>> This adds allows loop analysis to detect inductions varibales that
>>>> are incremented in both branches of an if rather than in a main
>>>> loop block. For example:
>>>>
>>>>      loop {
>>>>         block block_1:
>>>>         /* preds: block_0 block_7 */
>>>>         vec1 32 ssa_8 = phi block_0: ssa_4, block_7: ssa_20
>>>>         vec1 32 ssa_9 = phi block_0: ssa_0, block_7: ssa_4
>>>>         vec1 32 ssa_10 = phi block_0: ssa_1, block_7: ssa_4
>>>>         vec1 32 ssa_11 = phi block_0: ssa_2, block_7: ssa_21
>>>>         vec1 32 ssa_12 = phi block_0: ssa_3, block_7: ssa_22
>>>>         vec4 32 ssa_13 = vec4 ssa_12, ssa_11, ssa_10, ssa_9
>>>>         vec1 32 ssa_14 = ige ssa_8, ssa_5
>>>>         /* succs: block_2 block_3 */
>>>>         if ssa_14 {
>>>>            block block_2:
>>>>            /* preds: block_1 */
>>>>            break
>>>>            /* succs: block_8 */
>>>>         } else {
>>>>            block block_3:
>>>>            /* preds: block_1 */
>>>>            /* succs: block_4 */
>>>>         }
>>>>         block block_4:
>>>>         /* preds: block_3 */
>>>>         vec1 32 ssa_15 = ilt ssa_6, ssa_8
>>>>         /* succs: block_5 block_6 */
>>>>         if ssa_15 {
>>>>            block block_5:
>>>>            /* preds: block_4 */
>>>>            vec1 32 ssa_16 = iadd ssa_8, ssa_7
>>>>            vec1 32 ssa_17 = load_const (0x3f800000 /* 1.000000*/)
>>>>            /* succs: block_7 */
>>>>         } else {
>>>>            block block_6:
>>>>            /* preds: block_4 */
>>>>            vec1 32 ssa_18 = iadd ssa_8, ssa_7
>>>>            vec1 32 ssa_19 = load_const (0x3f800000 /* 1.000000*/)
>>>>            /* succs: block_7 */
>>>>         }
>>>>         block block_7:
>>>>         /* preds: block_5 block_6 */
>>>>         vec1 32 ssa_20 = phi block_5: ssa_16, block_6: ssa_18
>>>>         vec1 32 ssa_21 = phi block_5: ssa_17, block_6: ssa_4
>>>>         vec1 32 ssa_22 = phi block_5: ssa_4, block_6: ssa_19
>>>>         /* succs: block_1 */
>>>>      }
>>>>
>>>> Unfortunatly GCM could move the addition out of the if for us
>>>> (making this patch unrequired) but we still cannot enable the GCM
>>>> pass without regressions.
>>>>
>>>
>>> Just some questions / suggestions from my side for now.
>>> I'll try to take a closer look at the patch later today.
>>>
>>> While GCM would be nice, to me it seems that adding an
>>> if-opt instead, that pulls common code from both branches
>>> of an if out of the if on a more general basis, would get us
>>> this, plus a bunch of other benefits? As far as I can see there
>>> should never be negative impacts from pulling common code
>>> out like that, but I might be wrong. Did you look into that?
>>> I bet out did, I'm just interested in how that worked out.
>>
>> I didn't attempt this because pulling code out of the ifs can increase
>> register pressure. This is one of the problems we have with the GCM pass
>> currently, so for now I chose a more conservative approach.
>>
> 
> Yeah, of course. I'm being dumb. It looks better in source code,
> but as long as it does not lead to other optimizations it will only
> cause the live range of the add to intersect with that of the branch
> condition. The same amount of instructions will be executed
> either way.
> 
>>>
>>> Since GCM is not yet where we want it to be, maybe we'd
>>> want to implement LICM? That obviously does not come
>>> into play with what this patch adresses, but it might help
>>> get a more accurate estimate of the cost/benefit of unrolling?
>>> (Invariant computations that will be CSE'd will not be
>>> counted multiple times). This might already be accounted
>>> for by counting the invariant computations only once?
>>
>> No we don't do anything like this currently. The GCM pass can pull
>> things out of loops also, but again we hit register pressure issues with
>> that pass.
>>
>> As far as I'm aware reducing invariants is not where we get most of our
>> wins from with unrolling. Removing indirect array access, improving
>> opportunities for constant folding (and a bunch of other passes), being
>> able to evaluate the unfolded loop with the surrounding code etc all
>> result in greater benefits.
>>
>> With the limits we place on making sure we don't unroll large loops that
>> are going to cause register use issues, nobody has yet been able to show
>> that always unrolling loops is causing any harm, and it's certainly been
>> shown to help :)
> 
> Thanks for taking the time with my stupidity =) I'll try to take a look at
> these patches later tonight =)

All valid questions :) Thanks for taking a look.