[Mesa-dev] [PATCH 9/9] i965: Use NIR by default for Fragment shaders on SNB+
Connor Abbott
cwabbott0 at gmail.com
Wed Mar 18 08:42:22 PDT 2015
On Wed, Mar 18, 2015 at 11:27 AM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Wed, Mar 18, 2015 at 7:22 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>> On Wed, Mar 18, 2015 at 1:56 AM, Matt Turner <mattst88 at gmail.com> wrote:
>>> Thanks. Looked through stats and at some of the regressions.
>>>
>>> Some of the areas I noticed we were doing worse:
>>>
>>> We generate two CMPs for discard_if; only one without NIR. I think you
>>> had an idea about solving this.
>>>
>>> SEL peephole interference -- we knew about this in general, but I
>>> found an interesting failure mode:
>>>
>>> NIR generates
>>>
>>> (+f0) if(8) JIP: 4 UIP: 5
>>> else(8) JIP: 3
>>> mov(8) g124<1>F -g124<8,8,1>F
>>> endif(8) JIP: 6
>>>
>>> We should of course improve dead-control-flow-elimination to remove
>>> the else and set the if's predicate_inverse, but non-NIR generates
>>>
>>> (+f0) sel(8) g124<1>F g25<8,8,1>F -g25<8,8,1>F
>>>
>>> which is probably better even than a predicated MOV since it's not a
>>> partial write and subject to optimization restrictions. The assembly
>>> matches the NIR exactly, so I'm assuming NIR is removing a
>>> self-assignment in the if-then block, preventing it from turning it
>>> into a conditional select.
>>
>> I think this is because the sel peephole (along with all the other
>> optimizations) in NIR only works before we lower to source mods, where
>> this code probably looked something like:
>>
>> ssa_1 = ...;
>> if ... {
>> } else {
>> ssa_2 = fneg ssa_1;
>> }
>> ssa_3 = phi ssa_1, ssa_2;
>
> More specifically, we probably got
>
> b = foo ? a : -a
>
> from the GLSL source which NIR turns into
>
> if (foo) {
> ssa_1 = a;
> } else {
> ssa_2 = -a;
> }
> ssa_3 = phi(ssa_1, ssa_2)
>
> which gets copy-propagated to what Connor mentioned above.
>
>> So we can probably fix it just by adding fneg/ineg to
>> block_check_for_allowed_instrs().
>
> That was my first thought but we're starting to get into the world of
> backend-specific knowledge there. Maybe this is where we just need to
> pass in Eric's backend info struct and predicate doing so on whether
> or not neg is free? But I guess the only compiler we care about at
> the moment where that isn't true is vc4 which can't do control-flow
> right now anyway. Thoughts?
>
> We may also want to simply make the backend's peephole smarter.
I thought about predicating it on !lower_neg_to_sub, but if a backend
sets that then we'll already have lowered the negates to subtracts. We
could have a separate "my backend uses neg, but it's not actually
free" thing, or add a general "how much does this thing cost" callback
to nir_shader_compiler_options, but it feels like overkill to me right
now. Maybe we'll need it later, though...
>
>>>
>>> Also, since the fs's SEL peephole only recognizes MOVs to the same
>>> destination in order, it doesn't handle tropico-5/387 properly, which
>>> already improved from 110 to 79 instructions.
>>>
>>> I also noticed an interesting pattern in defense-grid/537 (and some
>>> others, I think). NIR generates
>>>
>>> mul(8) g4<1>F g2<8,8,1>F g3<8,8,1>F
>>> mul(8) g6<1>F g5<8,8,1>F g4<8,8,1>F
>>> mad(8) g127<1>F g6<4,4,1>F g3<4,4,1>F g2<4,4,1>F
>>>
>>> whereas without NIR we get:
>>>
>>> mul(8) g5<1>F g2<8,8,1>F g4<8,8,1>F
>>> mad(8) g127<1>F g5<4,4,1>F g3<4,4,1>F g5<4,4,1>F
>>>
>>> Again, the assembly matches the NIR, so NIR is doing something bad.
>>>
>>> The changes to sun-temple/209 (and a lot of other sun temple shaders)
>>> look wrong. I'm not sure if the translation from NIR is doing
>>> something bad or what, but we should have 8 texture fetches and we
>>> only end up with 2.
>>>
>>>
>>> Some highlights --
>>>
>>> Some shaders are improved because NIR recognizes exp2(log(x) * y) as
>>> pow(x, y) when the GLSL optimizations couldn't, because its dead code
>>> elimination is much too stupid to allow tree grafting to put the IR in
>>> the form we expect. This accounts for a low of the high percentage
>>> gains.
>>>
>>> It cuts some Kerbal Space Program shaders in half (~1000 -> ~500
>>> instructions) and removes ~100 spills and ~100 fills from each. Not
>>> sure how it actually accomplished this. Too hard to read the assembly
>>> with all of the spills.
>>>
>>> Code generated by tropico-5/252 is really stupid. We MOVs both before
>>> and in an if statement that do the same thing. NIR cleans that up
>>> easily as a side-effect of SSA.
>>>
>>> In strike-suit-zero/156 we emit more MADs that we couldn't recognize
>>> from the GLSL IR, since again, its DCE is terrible and we can't
>>> combine trees.
>>>
>>> In the-binding-of-isaac-rebirth/18, it's able to CSE some duplicate
>>> rcp expressions in different basic blocks. I'm not sure why the
>>> backend isn't doing this.
>>>
>>>
>>> At this point I've looked through all of the shaders helped by >15% or more.
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list