[Mesa-dev] [PATCH 0/3] i965: Delete all of the non-NIR vec4 code

Tue Sep 22 16:39:39 PDT 2015

On Tue, Sep 22, 2015 at 4:29 PM, Matt Turner <mattst88 at gmail.com> wrote:
> On Mon, Sep 21, 2015 at 7:22 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> On Mon, Sep 21, 2015 at 6:15 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>
>>> On Sep 21, 2015 5:45 PM, "Matt Turner" <mattst88 at gmail.com> wrote:
>>>>
>>>> On Mon, Sep 21, 2015 at 3:18 PM, Jason Ekstrand <jason at jlekstrand.net>
>>>> wrote:
>>>> > At this point, piglit is the same as for GLSL and the shader-db numbers
>>>> > are
>>>> > looking pretty good.  On SNB, GLSL vs. NIR for vec4 programs is:
>>>> >
>>>> >    total instructions in shared programs: 2020573 -> 1822601 (-9.80%)
>>>> >    instructions in affected programs:     1883334 -> 1685362 (-10.51%)
>>>> >    helped:                                13328
>>>> >    HURT:                                  3594
>>>> >
>>>> > and there are patches on the list that improve this to
>>>> >
>>>> >    total instructions in shared programs: 2020283 -> 1805487 (-10.63%)
>>>> >    instructions in affected programs:     1855759 -> 1640963 (-11.57%)
>>>> >    helped:                                14142
>>>> >    HURT:                                  2346
>>>>
>>>> Wow, that's great. I didn't realize we were that close.
>>>>
>>>> That said, I don't feel like we're /quite/ ready for this (especially
>>>> with outstanding optimization patches on the list). I'm not sure what
>>>> patches are pending.
>>>
>>> Only two: the one you sent today and Alejandro's patch to make copy
>>> propagation less type-sensitive.
>>>
>>>> Some things I've seen in digging through hurt programs today:
>>>>
>>>> portal-2/high/5134 emits:
>>>>
>>>>         vec1 ssa_53 = flog2 ssa_52
>>>>         vec1 ssa_54 = flog2 ssa_52.y
>>>>         vec1 ssa_55 = flog2 ssa_52.z
>>>>         vec4 ssa_56 = vec4 ssa_53, ssa_54, ssa_55, ssa_42.w
>>>>         vec3 ssa_57 = fmul ssa_56, ssa_3
>>>>         vec1 ssa_58 = fexp2 ssa_57
>>>>         vec1 ssa_59 = fexp2 ssa_57.y
>>>>         vec1 ssa_60 = fexp2 ssa_57.z
>>>>         vec4 ssa_61 = vec4 ssa_58, ssa_59, ssa_60, ssa_42.w
>>>>
>>>> which we didn't transform into a vec3 pow with or without NIR but we
>>>> really should. Why isn't NIR able to handle this?
>>
>> Ken and I were talking about this today.  What it comes down to is
>> that no one has written the pass yet.  We haven't done that many
>> vector optimizations to date.
>
> Yeah, I'm not concerned about this. We weren't doing it before either,
> so it's not a regression.

Yeah, writing a vectorizor is something that should probably be done
but isn't any more urgent than any other "make vec4 better" thing.

>>>> (also, why isn't
>>>> ".x" printed when the use of an ssa value scalar, e.g., in the
>>>> assignment of ssa_58 the RHS should use ssa_57.x).
>>
>> It doesn't print the identity swizzle.
>
> Patch sent.
>
>>>> We generate worse code for all_equal/any_nequal/any.
>>
>> Yes, we should fix that.  Suggestions/patches welcome, I don't have
>> any hot ideas at the moment.
>>
>>>> book-of-unwritten-tales/original/vp-33 (a vertex program) emits uses
>>>> DPH and NIR doesn't have DPH. NIR should probably grow a DPH
>>>> instruction even if we don't have an optimization to recognize
>>>> open-coded DPH.
>>
>> We could detect fdot(vec4(a.x, a.y, a.z, 1)) in the backend if we
>> really wanted to.  The long-term solution is probably to add swizzle
>> support to nir_search but that's going to be a real bear.  How would
>> having the nir_op_dph instruction help if we can't recognize it?
>
> Because the ARB vertex program language and the Mesa IR that we
> translate from both have DPH already. We're just throwing it away
> because NIR doesn't have DPH.

Right.  Should be easy enough to add.  I can do that if you'd like or
you can; I don't care.

>>>> Lots of things hurt because of lack of global copy/constant
>>>> propagation. I think NIR often emits the constant loads in blocks
>>>> earlier than their uses and the backend optimizations aren't able to
>>>> cope. See team-fortress-2/2197 for example (search for 953267991D, the
>>>> hex value for 0.0001F).
>>
>> Hrm...  One option would be to copy-prop load_const in emit_alu.  This
>> should be easy enough to do if we detect that it's a 2-src and one
>> source is an immediate.  We could also do global copy-prop but I don't
>> know how hard that is.
>>
>>>> I remember this issue from the FS/NIR backend as well, but dota-2/504
>>>> (and others) emit:
>>>>
>>>> mad(8)  g16<1>.xF  g11<4,4,1>.xF  g12<4,4,1>.xF  g2<4,4,1>.xF
>>>> mad(8)  g19<1>.xF  g10<4,4,1>.xF  g12<4,4,1>.xF  g2<4,4,1>.xF
>>>> mad(8)  g22<1>.xF  g9<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
>>>> mad(8)  g25<1>.xF  g8<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
>>>> mad(8)  g28<1>.xF  g7<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
>>>> mad(8)  g31<1>.xF  g6<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
>>>>
>>>> where the multiplication is duplicated. I can't remember what we decided.
>>
>> If I remember correctly, it came down to "optimization is hard" and we
>> said "good enough" about our current heuristics.
>
> I dug out the old threads, but all I found was a snarky reply.
>
> Patch sent.