<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Thu, Oct 18, 2018 at 2:49 PM Ian Romanick <<a href="mailto:idr@freedesktop.org">idr@freedesktop.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 10/17/2018 11:33 AM, Jason Ekstrand wrote:<br>
> From: Connor Abbott <<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>><br>
> <br>
> Shader-db results on Haswell:<br>
> <br>
> total instructions in shared programs: 2180337 -> 2154080 (-1.20%)<br>
> instructions in affected programs: 959766 -> 933509 (-2.74%)<br>
> helped: 5653<br>
> HURT: 2560<br>
> <br>
> total cycles in shared programs: 12339326 -> 12307102 (-0.26%)<br>
> cycles in affected programs: 6102794 -> 6070570 (-0.53%)<br>
> helped: 3838<br>
> HURT: 4868<br>
<br>
In cases like this, the extra statistics generated by my extra changes<br>
to report.py can be informative. Give me a few minutes, and I'll gather<br>
that data.<br>
<br>
> Most of the hurt programs seem to be because we generate extra MOV's due<br>
> to vectorizing things. For example, in<br>
> shaders/non-free/steam/anomaly-2/158.shader_test, this:<br>
> <br>
> add(8) g116<1>.xyF g12<4,4,1>.xyyyF g1.4<0,4,1>.xyyyF { align16 NoDDClr 1Q };<br>
> add(8) g117<1>.xyF g12<4,4,1>.xyyyF g1.4<0,4,1>.zwwwF { align16 NoDDClr 1Q };<br>
> add(8) g116<1>.zwF g12<4,4,1>.xxxyF -g1.4<0,4,1>.xxxyF { align16 NoDDChk 1Q };<br>
> add(8) g117<1>.zwF g12<4,4,1>.xxxyF -g1.4<0,4,1>.zzzwF { align16 NoDDChk 1Q };<br>
> <br>
> Turns into this:<br>
> <br>
> add(8) g13<1>F g12<4,4,1>.xyxyF g1.4<0,4,1>F { align16 1Q };<br>
> add(8) g14<1>F g12<4,4,1>.xyxyF -g1.4<0,4,1>F { align16 1Q };<br>
> mov(8) g116<1>.xyD g13<4,4,1>.xyyyD { align16 NoDDClr 1Q };<br>
> mov(8) g117<1>.xyD g13<4,4,1>.zwwwD { align16 NoDDClr 1Q };<br>
> mov(8) g116<1>.zwD g14<4,4,1>.xxxyD { align16 NoDDChk 1Q };<br>
> mov(8) g117<1>.zwD g14<4,4,1>.zzzwD { align16 NoDDChk 1Q };<br>
> <br>
> So we eliminated two add's, but then had to introduce four mov's to<br>
> transpose the result. Some of the hurt is because vectorization is a bit<br>
> over-aggressive and we vectorize something when we should have left it<br>
> as a scalar and CSEd it. Unfortunately, this is all really tricky to do<br>
> as it involves the interactions between many different components.<br>
<br>
This seems to me like vectorization should be done later in the<br>
optimization pipeline. I would have guessed that it would go after the<br>
regular optimization loop. Did you try calling it from other places to<br>
see the effects?<br></blockquote><div><br></div><div>No, I've done very little work on this. I mostly rebased Connor's patches, got them working again, and sent them to the list. Someone was asking about it on IRC in the context of old Mali hardware, I think. I was surprised to find we'd never actually landed it so I decided to freshen it up a bit so that others could at least experiment with it again. Turns out that a lot has changed in NIR in the last three years...<br></div><div><br></div><div>--Jason<br></div></div></div>