[Mesa-dev] [PATCH] nir/remove_phis: handle trivial back-edges

Connor Abbott cwabbott0 at gmail.com
Fri Oct 2 11:26:38 PDT 2015


On Fri, Oct 2, 2015 at 1:26 PM, Connor Abbott <cwabbott0 at gmail.com> wrote:
> On Fri, Oct 2, 2015 at 12:56 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Fri, Oct 2, 2015 at 9:28 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>> Some loops may have phi nodes that look like:
>>>
>>> foo = ...
>>> loop {
>>>     bar = phi(foo, bar)
>>>     ...
>>> }
>>>
>>> in which case we can remove the phi node and replace all uses of 'bar'
>>> with 'foo'. In particular, there are some L4D2 vertex shaders with loops
>>> that, after optimization, look like:
>>>
>>>         /* succs: block_1 */
>>>         loop {
>>>                 block block_1:
>>>                 /* preds: block_0 block_4 */
>>>                 vec1 ssa_2195 = phi block_0: ssa_2136, block_4: ssa_994
>>>                 vec1 ssa_7321 = phi block_0: ssa_8195, block_4: ssa_7321
>>>                 vec1 ssa_7324 = phi block_0: ssa_8198, block_4: ssa_7324
>>>                 vec1 ssa_7327 = phi block_0: ssa_8174, block_4: ssa_7327
>>>                 vec1 ssa_8139 = intrinsic load_uniform () () (232)
>>>                 vec1 ssa_588 = ige ssa_2195, ssa_8139
>>>                 /* succs: block_2 block_3 */
>>>                 if ssa_588 {
>>>                         block block_2:
>>>                         /* preds: block_1 */
>>>                         break
>>>                         /* succs: block_5 */
>>>                 } else {
>>>                         block block_3:
>>>                         /* preds: block_1 */
>>>                         /* succs: block_4 */
>>>                 }
>>>                 block block_4:
>>>                 /* preds: block_3 */
>>>                 vec1 ssa_994 = iadd ssa_2195, ssa_2150
>>>                 /* succs: block_1 */
>>>         }
>>>
>>> where after removing the second, third, and fourth phi nodes, the loop becomes
>>> entirely dead, and this patch will cause the loop to be deleted entirely.
>>>
>>> No piglit regressions.
>>>
>>> Shader-db results on bdw:
>>>
>>> instructions in affected programs:     5824 -> 5664 (-2.75%)
>>> helped:                                32
>>> HURT:                                  0
>>
>> I added loop-count support to report.py recently.
>>
>> instructions in affected programs:     5824 -> 5664 (-2.75%)
>> total loops in shared programs:        2234 -> 2202 (-1.43%)
>> helped:                                0
>>
>> (report.py doesn't count programs in which the number of loops changed
>> as helped/hurt because sometimes a loop is unrolled and the
>> instruction count triples)
>>
>> I'd feel fine about changing helped: 0 to helped: 32 like you have.
>>
>>>
>>> Signed-off-by: Connor Abbott <cwabbott0 at gmail.com>
>>> ---
>>> Seems like this one fell through the cracks...
>>>
>>>  src/glsl/nir/nir_opt_remove_phis.c | 20 ++++++++++++++++++++
>>>  1 file changed, 20 insertions(+)
>>>
>>> diff --git a/src/glsl/nir/nir_opt_remove_phis.c b/src/glsl/nir/nir_opt_remove_phis.c
>>> index bf4a67e..0a5793a 100644
>>> --- a/src/glsl/nir/nir_opt_remove_phis.c
>>> +++ b/src/glsl/nir/nir_opt_remove_phis.c
>>> @@ -58,6 +58,21 @@ remove_phis_block(nir_block *block, void *state)
>>>
>>>        nir_foreach_phi_src(phi, src) {
>>>           assert(src->src.is_ssa);
>>> +
>>> +         /* For phi nodes at the beginning of loops, we may encounter some
>>> +          * sources from backedges that point back to the destination of the
>>> +          * same phi, i.e. something like:
>>> +          *
>>> +          * a = phi(a, b, ...)
>>> +          *
>>> +          * We can safely ignore these sources, since if all of the normal
>>> +          * sources point to the same definition, then that definition must
>>> +          * still dominate the phi node, and the phi will still always take
>>> +          * the value of that definition.
>>> +          */
>>> +
>>
>> I'd remove the newline here.
>>
>>> +         if (src->src.ssa == &phi->dest.ssa)
>>> +            continue;
>>>
>>>           if (def == NULL) {
>>>              def  = src->src.ssa;
>>> @@ -72,6 +87,11 @@ remove_phis_block(nir_block *block, void *state)
>>>        if (!srcs_same)
>>>           continue;
>>>
>>> +      /* We must have found at least one definition, since there must be at
>>> +       * least one forward edge.
>>> +       */
>>> +      assert(def != NULL);
>>> +
>>>        assert(phi->dest.is_ssa);
>>>        nir_ssa_def_rewrite_uses(&phi->dest.ssa, nir_src_for_ssa(def));
>>>        nir_instr_remove(instr);
>>> --
>>> 2.1.0
>>
>> Look good to me. Thanks for remembering this.
>>
>> Reviewed-by: Matt Turner <mattst88 at gmail.com>
>
> Ok, I fixed the newline and the results and pushed it.
>
>>
>> One strange thing I noticed... this doesn't affect any programs on
>> Haswell. All the shaders are vertex shaders, so they're vec4 on
>> Haswell but scalar on Broadwell, so it's probably somehow related to
>> that.
>>
>> Want to take a look at what's going on with Haswell? Seems like if the
>> loop is dead... it should be dead.
>
> Yeah, I noticed that too. I'll look into it... it could just be that
> the loops were getting deleted before this patch anyways without the
> scalarizer.

Ok, so I looked at this a bit on my IVB. Specifically, I looked at
shaders/non-free/steam/left-4-dead-2/medium/3716.shader_test (although
there are a lot that could do). None of the loop gets eliminated :(,
and the culprit dragging in the whole thing appears to be ssa_119.
It's defined by a phi node at the beginning of the loop:

vec4 ssa_119 = phi block_0: ssa_103, block_4: ssa_159

and used twice inside the loop, like so:

vec3 ssa_127 = fadd -ssa_119, ssa_126
vec4 ssa_159 = vec4 ssa_119, ssa_119.y, ssa_119.z, ssa_158

and thrice outside the loop:

vec4 ssa_165 = vec4 ssa_119, ssa_119.y, ssa_119.z, ssa_0.w
vec2 ssa_166 = fadd -ssa_119, ssa_106
vec4 ssa_169 = vec4 ssa_119, ssa_119.y, ssa_119.z, ssa_48.y

Notice that nothing ever uses more than the first three components of
ssa_119! Also notice that the backedge of the phi node defining
ssa_119 comes from ssa_159, who's first three components come from
ssa_119. So DCE sees that ssa_119 is used, and therefore ssa_159 is
used, so therefore ssa_158 is used, and since ssa_158 is just some
other random thing it marks the entire loop as used. If we could
shrink ssa_119 to only 3 components, then DCE would delete the body of
the loop, this patch would kick in and remove ssa_119, and we could
delete the entire loop.

So, TL;DR: we need to have something for removing unused components of
vectors in order to delete the loop here. Of course, that would
probably help vec4 in lots of other ways too.

BTW, Eric did have a series to do DCE per-component a while ago:

http://cgit.freedesktop.org/~anholt/mesa/log/?h=nir-dce-components

so we could reuse that.


More information about the mesa-dev mailing list