[Mesa-dev] [PATCH] i965: Use NIR by default for vertex shaders on GEN8+

Fri May 8 10:08:35 PDT 2015

On Fri, May 8, 2015 at 3:27 AM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> On Thursday, May 07, 2015 06:17:46 PM Matt Turner wrote:
>> On Thu, May 7, 2015 at 4:50 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> > GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell:
>> >
>> >    total instructions in shared programs: 2724483 -> 2711790 (-0.47%)
>> >    instructions in affected programs:     1860859 -> 1848166 (-0.68%)
>> >    helped:                                4387
>> >    HURT:                                  4758
>> >    GAINED:                                1499
>> >
>> > The gained programs are ARB vertext programs that were previously going
>> > through the vec4 backend.  Now that we have prog_to_nir, ARB vertex
>> > programs can go through the scalar backend so they show up as "gained" in
>> > the shader-db results.
>>
>> Again, I'm kind of confused and disappointed that we're just okay with
>> hurting 4700 programs without more analysis. I guess I'll go do
>> that...
>
> I took a stab at that tonight.  The good news is, the majority of the
> hurt is pretty stupid.  Indirect uniform address calculations involve
> a lot of integer multiplication by 4.
>
> For whatever reason, we're getting 4*x instead of x*4, which doesn't
> support immediates.  So we get:
>
>     MOV tmp 4
>     MUL dst tmp x
>
> Normally, constant propagation would commute the operands, giving us
> "MUL dst x 4" like we want.  But it sees integer multiplication and
> chickens out, due to the asymmetry on some platforms.

Right.  I just sent out a patch that puts immediates in src1 for
commutative ALU ops at the emit stage.  We probably still want to do
something in constant propagation in case we can fold something, but
it fixes the problem for now.  It also helps even more than the
shifting patch.

I don't know.  Maybe we just want to make constant propagation do the
right thing on BDW+.  Matt?
--jason

> I think we can extend that - on Broadwell it should work fine, and
> might work fine for 16-bit immediates on Gen7 and Cherryview, too.
>
> Alternatively, I wrote a nir_opt_algebraic_late optimization that turns
> 4*x into x << 2, which works around the problem, and is also apparently
> much better for R600.
>
> Statistics on the shift patch are:
>
>     total instructions in shared programs: 7432587 -> 7388982 (-0.59%)
>     instructions in affected programs:     1360411 -> 1316806 (-3.21%)
>     helped:                                5772
>     HURT:                                  0
>
> Statistics for GLSL IR vs. NIR+(4*x => x << 2):
>
>     total instructions in shared programs: 7232451 -> 7175983 (-0.78%)
>     instructions in affected programs:     1586917 -> 1530449 (-3.56%)
>     helped:                                5780
>     HURT:                                  1654
>
> which is much better.
>
> Looking at a couple of the shaders that are still worse off...it looks
> like a ton of Source shaders used to do MUL/ADD with an attribute and
> two immediates, and now are doing MOV/MOV/MAD.