[Mesa-dev] [PATCH 1/4] nir: Do basic constant reassociation.

Matt Turner mattst88 at gmail.com
Tue Apr 12 00:11:32 UTC 2016


On Thu, Apr 7, 2016 at 4:35 PM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> Many shaders contain expression trees of the form:
>
>     const_1 * (value * const_2)
>
> Reorganizing these to
>
>     (const_1 * const_2) * value
>
> will allow constant folding to combine the constants.  Sometimes, these
> constants are 2 and 0.5, so we can remove a multiply altogether.  Other
> times, it can create more immediate constants, which can actually hurt.
>
> Finding a good balance here is tricky.  While much more could be done,
> this simple patch seems to have a lot of positive benefit while having
> a low downside.
>
> shader-db results on Broadwell:
>
> total instructions in shared programs: 8963768 -> 8961369 (-0.03%)
> instructions in affected programs: 438318 -> 435919 (-0.55%)
> helped: 1502
> HURT: 245
>
> total cycles in shared programs: 71527354 -> 71421516 (-0.15%)
> cycles in affected programs: 11541788 -> 11435950 (-0.92%)
> helped: 3445
> HURT: 1224
>

The series is

Reviewed-by: Matt Turner <mattst88 at gmail.com>

The shaders most hurt from this patch are... funny. They do

        s_texcoord_0 = texcoord + offset * vec4(-1.5,-1.5,-0.5,-1.5);
        s_texcoord_1 = texcoord + offset * vec4( 0.5,-1.5, 1.5,-1.5);
        s_texcoord_2 = texcoord + offset * vec4(-1.5,-0.5,-0.5,-0.5);
        s_texcoord_3 = texcoord + offset * vec4( 0.5,-0.5, 1.5,-0.5);
        s_texcoord_4 = texcoord + offset * vec4(-1.5, 0.5,-0.5, 0.5);
        s_texcoord_5 = texcoord + offset * vec4( 0.5, 0.5, 1.5, 0.5);
        s_texcoord_6 = texcoord + offset * vec4(-1.5, 1.5,-0.5, 1.5);
        s_texcoord_7 = texcoord + offset * vec4( 0.5, 1.5, 1.5, 1.5);

Today, we generate 8 MOV instructions with VF immediates. We could
have just loaded 0.5, -0.5, 1.5, and -1.5 with a single VF immediate
and then swizzled that as needed, but how would we recognize that all
of these can be combined? NIR CSE? Part of the difficulty is that each
of the vec4s in the source language include only 3 of the immediate
floats -- none contain all four.

Anyway, the programs go from 28->38 instructions because when things
are multiplied in a different order the constants become 0.125,
-0.125, 0.375, and -0.375, and since ±0.125 isn't representable as a
VF we generate even more silly instructions!

If we could recognize all of these as swizzles of vec4(0.5, -0.5, 1.5,
-1.5) at the NIR level we could at cut those hurt programs down by a
lot.

Everything else just looks like noise to me.


More information about the mesa-dev mailing list