[Mesa-dev] [PATCH 0/3] i965: Delete all of the non-NIR vec4 code

Tue Sep 22 16:29:19 PDT 2015

On Mon, Sep 21, 2015 at 7:22 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Mon, Sep 21, 2015 at 6:15 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>
>> On Sep 21, 2015 5:45 PM, "Matt Turner" <mattst88 at gmail.com> wrote:
>>>
>>> On Mon, Sep 21, 2015 at 3:18 PM, Jason Ekstrand <jason at jlekstrand.net>
>>> wrote:
>>> > At this point, piglit is the same as for GLSL and the shader-db numbers
>>> > are
>>> > looking pretty good.  On SNB, GLSL vs. NIR for vec4 programs is:
>>> >
>>> >    total instructions in shared programs: 2020573 -> 1822601 (-9.80%)
>>> >    instructions in affected programs:     1883334 -> 1685362 (-10.51%)
>>> >    helped:                                13328
>>> >    HURT:                                  3594
>>> >
>>> > and there are patches on the list that improve this to
>>> >
>>> >    total instructions in shared programs: 2020283 -> 1805487 (-10.63%)
>>> >    instructions in affected programs:     1855759 -> 1640963 (-11.57%)
>>> >    helped:                                14142
>>> >    HURT:                                  2346
>>>
>>> Wow, that's great. I didn't realize we were that close.
>>>
>>> That said, I don't feel like we're /quite/ ready for this (especially
>>> with outstanding optimization patches on the list). I'm not sure what
>>> patches are pending.
>>
>> Only two: the one you sent today and Alejandro's patch to make copy
>> propagation less type-sensitive.
>>
>>> Some things I've seen in digging through hurt programs today:
>>>
>>> portal-2/high/5134 emits:
>>>
>>>         vec1 ssa_53 = flog2 ssa_52
>>>         vec1 ssa_54 = flog2 ssa_52.y
>>>         vec1 ssa_55 = flog2 ssa_52.z
>>>         vec4 ssa_56 = vec4 ssa_53, ssa_54, ssa_55, ssa_42.w
>>>         vec3 ssa_57 = fmul ssa_56, ssa_3
>>>         vec1 ssa_58 = fexp2 ssa_57
>>>         vec1 ssa_59 = fexp2 ssa_57.y
>>>         vec1 ssa_60 = fexp2 ssa_57.z
>>>         vec4 ssa_61 = vec4 ssa_58, ssa_59, ssa_60, ssa_42.w
>>>
>>> which we didn't transform into a vec3 pow with or without NIR but we
>>> really should. Why isn't NIR able to handle this?
>
> Ken and I were talking about this today.  What it comes down to is
> that no one has written the pass yet.  We haven't done that many
> vector optimizations to date.

Yeah, I'm not concerned about this. We weren't doing it before either,
so it's not a regression.

>>> (also, why isn't
>>> ".x" printed when the use of an ssa value scalar, e.g., in the
>>> assignment of ssa_58 the RHS should use ssa_57.x).
>
> It doesn't print the identity swizzle.

Patch sent.

>>> We generate worse code for all_equal/any_nequal/any.
>
> Yes, we should fix that.  Suggestions/patches welcome, I don't have
> any hot ideas at the moment.
>
>>> book-of-unwritten-tales/original/vp-33 (a vertex program) emits uses
>>> DPH and NIR doesn't have DPH. NIR should probably grow a DPH
>>> instruction even if we don't have an optimization to recognize
>>> open-coded DPH.
>
> We could detect fdot(vec4(a.x, a.y, a.z, 1)) in the backend if we
> really wanted to.  The long-term solution is probably to add swizzle
> support to nir_search but that's going to be a real bear.  How would
> having the nir_op_dph instruction help if we can't recognize it?

Because the ARB vertex program language and the Mesa IR that we
translate from both have DPH already. We're just throwing it away
because NIR doesn't have DPH.

>>> Lots of things hurt because of lack of global copy/constant
>>> propagation. I think NIR often emits the constant loads in blocks
>>> earlier than their uses and the backend optimizations aren't able to
>>> cope. See team-fortress-2/2197 for example (search for 953267991D, the
>>> hex value for 0.0001F).
>
> Hrm...  One option would be to copy-prop load_const in emit_alu.  This
> should be easy enough to do if we detect that it's a 2-src and one
> source is an immediate.  We could also do global copy-prop but I don't
> know how hard that is.
>
>>> I remember this issue from the FS/NIR backend as well, but dota-2/504
>>> (and others) emit:
>>>
>>> mad(8)  g16<1>.xF  g11<4,4,1>.xF  g12<4,4,1>.xF  g2<4,4,1>.xF
>>> mad(8)  g19<1>.xF  g10<4,4,1>.xF  g12<4,4,1>.xF  g2<4,4,1>.xF
>>> mad(8)  g22<1>.xF  g9<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
>>> mad(8)  g25<1>.xF  g8<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
>>> mad(8)  g28<1>.xF  g7<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
>>> mad(8)  g31<1>.xF  g6<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
>>>
>>> where the multiplication is duplicated. I can't remember what we decided.
>
> If I remember correctly, it came down to "optimization is hard" and we
> said "good enough" about our current heuristics.

I dug out the old threads, but all I found was a snarky reply.

Patch sent.