[Mesa-dev] [PATCH 3/3] i965/nir: use vectorization for non-scalar stages

Thu Oct 18 19:51:19 UTC 2018

On Thu, Oct 18, 2018 at 2:49 PM Ian Romanick <idr at freedesktop.org> wrote:

> On 10/17/2018 11:33 AM, Jason Ekstrand wrote:
> > From: Connor Abbott <cwabbott0 at gmail.com>
> >
> > Shader-db results on Haswell:
> >
> >     total instructions in shared programs: 2180337 -> 2154080 (-1.20%)
> >     instructions in affected programs: 959766 -> 933509 (-2.74%)
> >     helped: 5653
> >     HURT: 2560
> >
> >     total cycles in shared programs: 12339326 -> 12307102 (-0.26%)
> >     cycles in affected programs: 6102794 -> 6070570 (-0.53%)
> >     helped: 3838
> >     HURT: 4868
>
> In cases like this, the extra statistics generated by my extra changes
> to report.py can be informative.  Give me a few minutes, and I'll gather
> that data.
>
> > Most of the hurt programs seem to be because we generate extra MOV's due
> > to vectorizing things. For example, in
> > shaders/non-free/steam/anomaly-2/158.shader_test, this:
> >
> > add(8)          g116<1>.xyF     g12<4,4,1>.xyyyF g1.4<0,4,1>.xyyyF {
> align16 NoDDClr 1Q };
> > add(8)          g117<1>.xyF     g12<4,4,1>.xyyyF g1.4<0,4,1>.zwwwF {
> align16 NoDDClr 1Q };
> > add(8)          g116<1>.zwF     g12<4,4,1>.xxxyF -g1.4<0,4,1>.xxxyF {
> align16 NoDDChk 1Q };
> > add(8)          g117<1>.zwF     g12<4,4,1>.xxxyF -g1.4<0,4,1>.zzzwF {
> align16 NoDDChk 1Q };
> >
> > Turns into this:
> >
> > add(8)          g13<1>F         g12<4,4,1>.xyxyF g1.4<0,4,1>F   {
> align16 1Q };
> > add(8)          g14<1>F         g12<4,4,1>.xyxyF -g1.4<0,4,1>F  {
> align16 1Q };
> > mov(8)          g116<1>.xyD     g13<4,4,1>.xyyyD                {
> align16 NoDDClr 1Q };
> > mov(8)          g117<1>.xyD     g13<4,4,1>.zwwwD                {
> align16 NoDDClr 1Q };
> > mov(8)          g116<1>.zwD     g14<4,4,1>.xxxyD                {
> align16 NoDDChk 1Q };
> > mov(8)          g117<1>.zwD     g14<4,4,1>.zzzwD                {
> align16 NoDDChk 1Q };
> >
> > So we eliminated two add's, but then had to introduce four mov's to
> > transpose the result.  Some of the hurt is because vectorization is a bit
> > over-aggressive and we vectorize something when we should have left it
> > as a scalar and CSEd it.  Unfortunately, this is all really tricky to do
> > as it involves the interactions between many different components.
>
> This seems to me like vectorization should be done later in the
> optimization pipeline.  I would have guessed that it would go after the
> regular optimization loop.  Did you try calling it from other places to
> see the effects?
>

No, I've done very little work on this.  I mostly rebased Connor's patches,
got them working again, and sent them to the list.  Someone was asking
about it on IRC in the context of old Mali hardware, I think.  I was
surprised to find we'd never actually landed it so I decided to freshen it
up a bit so that others could at least experiment with it again.  Turns out
that a lot has changed in NIR in the last three years...

--Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20181018/bf7c3765/attachment.html>