[Mesa-dev] [PATCH] i965: Use NIR by default for vertex shaders on GEN8+

Matt Turner mattst88 at gmail.com
Sat May 16 12:59:59 PDT 2015

On Sat, May 16, 2015 at 12:45 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Sat, May 16, 2015 at 12:12 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> On Fri, May 8, 2015 at 3:27 AM, Kenneth Graunke <kenneth at whitecape.org> wrote:
>>> Looking at a couple of the shaders that are still worse off...it looks
>>> like a ton of Source shaders used to do MUL/ADD with an attribute and
>>> two immediates, and now are doing MOV/MOV/MAD.
>> I just looked, and thought that too for a minute, but it actually
>> shouldn't be doing that. Take for instance:
>> shaders/closed/steam/dota-2/498.shader_test VS SIMD8: 47 -> 53 (12.77%)
>> It indeed replaces 6x MUL/ADD pairs with MOV/MAD (introducing 6 extra
>> MOVs), but....
>> Without NIR we have
>> mul(8)          g15<1>F         g6<8,8,1>F      6F
>> ...
>> add(8)          g16<1>F         g15<8,8,1>F     2.1F
>> add(8)          g35<1>F         g15<8,8,1>F     3.1F
>> add(8)          g42<1>F         g15<8,8,1>F     4.1F
>> add(8)          g45<1>F         g15<8,8,1>F     5.1F
>> add(8)          g48<1>F         g15<8,8,1>F     0.1F
>> add(8)          g51<1>F         g15<8,8,1>F     1.1F
>> That is, one multiply is consumed by 6 adds.
>> With NIR we have
>> mov(1)          g22<1>F         2.1F
>> mov(1)          g22.1<1>F       6F
>> mad(8)          g16<1>F         g22<0,1,0>.xF   g22.1<0,1,0>.xF g6<4,4,1>F
>> mov(1)          g22.2<1>F       3.1F
>> mad(8)          g23<1>F         g22.2<0,1,0>.xF g22.1<0,1,0>.xF g6<4,4,1>F
>> mov(1)          g22.3<1>F       4.1F
>> mad(8)          g30<1>F         g22.3<0,1,0>.xF g22.1<0,1,0>.xF g6<4,4,1>F
>> mov(1)          g22.4<1>F       5.1F
>> mad(8)          g33<1>F         g22.4<0,1,0>.xF g22.1<0,1,0>.xF g6<4,4,1>F
>> mov(1)          g22.5<1>F       0.1F
>> mad(8)          g36<1>F         g22.5<0,1,0>.xF g22.1<0,1,0>.xF g6<4,4,1>F
>> mov(1)          g22.6<1>F       1.1F
>> mad(8)          g39<1>F         g22.6<0,1,0>.xF g22.1<0,1,0>.xF g6<4,4,1>F
>> So we're doing the g6 * 6F operation 6 times! We see this in the NIR as well:
>>         vec1 ssa_419 = ffma ssa_384, ssa_132, ssa_133
>>         vec1 ssa_423 = ffma ssa_384, ssa_132, ssa_135
>>         vec1 ssa_427 = ffma ssa_384, ssa_132, ssa_137
>>         vec1 ssa_428 = ffma ssa_384, ssa_132, ssa_139
>>         vec1 ssa_429 = ffma ssa_384, ssa_132, ssa_141
>>         vec1 ssa_430 = ffma ssa_384, ssa_132, ssa_144
>> Whoops. Ideas for fixing that? I'm guessing that this accounts for
>> nearly all of the remaining 1120 hurt programs.
> Ugh... We've been tacitly assuming that your constant combine stuff
> will magically make immediates not a problem.  In this case, they are
> a problem.  I guess we could do something different for 1 vs. 2
> immediates.

That's not really the problem as far as I see. I mean, we could split
MADs that do x * imm + imm, but I would think NIR shouldn't be
combining these operations if the multiply is used in a bunch of

The current code in the ffma peephole in does... to quote the comment:

      /* Only absorb a fmul into a ffma if the fmul is is only used in fadd
       * operations.  This prevents us from being too aggressive with our
       * fusing which can actually lead to more instructions.

Can't we pretty trivially modify that to count the number of uses as
well and only combine if it's used in one place?

To be honest, before I looked in the code I thought that's what it was doing.

More information about the mesa-dev mailing list