[Mesa-dev] [RFC 2/2] i965: switch fmul to increase chance of optimising it away

Sat Dec 31 04:04:38 UTC 2016

On Dec 30, 2016 9:57 PM, "Timothy Arceri" <timothy.arceri at collabora.com>
wrote:

On Fri, 2016-12-30 at 19:23 -0800, Jason Ekstrand wrote:
> On Dec 30, 2016 3:50 AM, "Timothy Arceri" <timothy.arceri at collabora.c
> om> wrote:
> If one of the inputs to the multiplcation in ffma is the result of
> an fmul there is a chance that we can reuse the result of that
> fmul in other ffma calls if we do the multiplication in the right
> order.
>
> For example it is a fairly common pattern for shaders to do something
> similar to this:
>
>   const float a = 0.5;
>   in vec4 b;
>   in float c;
>
>   ...
>
>   b.x = b.x * c;
>   b.y = b.y * c;
>
>   ...
>
>   b.x = b.x * a + a;
>   b.y = b.y * a + a;
>
> So by simply detecting that constant a is part of the multiplication
> in ffma and switching it with previous fmul that updates b we end up
> with:
>
>   ...
>
>   c = a * c;
>
>   ...
>
>   b.x = b.x * c + a;
>   b.y = b.y * c + a;
>
> shader-db results BDW:
>
> total instructions in shared programs: 13065888 -> 13045434 (-0.16%)
> instructions in affected programs: 2436228 -> 2415774 (-0.84%)
> helped: 10261
> HURT: 30
>
> Nice!  Those are some impressive instruction count reductions.
>
> I'm not sure what I think of the approach though.  We could probably
> also do this in the ffma peephole itself.

Yeah most of the improvement comes from fixing the output of ffma
peephole but there are still others that occur elsewhere.

But it only affects things used in ffma...  Curious.

>
> total cycles in shared programs: 253619698 -> 253418728 (-0.08%)
> cycles in affected programs: 141182838 -> 140981868 (-0.14%)
> helped: 8853
> HURT: 3162
>
> total loops in shared programs: 2952 -> 2918 (-1.15%)
> loops in affected programs: 66 -> 32 (-51.52%)
> helped: 22
> HURT: 0
>
> total spills in shared programs: 15106 -> 14840 (-1.76%)
> spills in affected programs: 8475 -> 8209 (-3.14%)
> helped: 287
> HURT: 31
>
> total fills in shared programs: 20210 -> 19708 (-2.48%)
> fills in affected programs: 12054 -> 11552 (-4.16%)
> helped: 293
> HURT: 28
>
> LOST:   8
> GAINED: 5
>
> All the HURT besides an increase of a single instruction in a
> yofrankie shader comes from deus-ex, however the helped
> fills/spills/instructions far outways the HURT for other deus-ex
> shaders.
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 1 +
>  src/mesa/drivers/dri/i965/brw_nir.c   | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/src/compiler/nir/nir_opt_algebraic.py
> b/src/compiler/nir/nir_opt_algebraic.py
> index 982f8b2..b13e484 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -111,6 +111,7 @@ optimizations = [
>     (('~ffma', a, b, 0.0), ('fmul', a, b)),
>     (('ffma', a, 1.0, b), ('fadd', a, b)),
>     (('ffma', 1.0, a, b), ('fadd', a, b)),
> +   (('ffma', ('!fmul', a, b), '#c', d), ('ffma', a, ('fmul', c, b),
> d)),
>     (('~flrp', a, b, 0.0), a),
>     (('~flrp', a, b, 1.0), b),
>     (('~flrp', a, a, b), a),
> diff --git a/src/mesa/drivers/dri/i965/brw_nir.c
> b/src/mesa/drivers/dri/i965/brw_nir.c
> index 6f37e97..7babc54 100644
> --- a/src/mesa/drivers/dri/i965/brw_nir.c
> +++ b/src/mesa/drivers/dri/i965/brw_nir.c
> @@ -550,6 +550,7 @@ brw_postprocess_nir(nir_shader *nir, const struct
> brw_compiler *compiler,
>     if (devinfo->gen >= 6) {
>        /* Try and fuse multiply-adds */
>        OPT(brw_nir_opt_peephole_ffma);
> +      nir = nir_optimize(nir, compiler, is_scalar);
>
> Why not just put this optimization in the late bucket with the other
> post-ffma optimizations.

Well we need to at least do DCE after peephole ffma, but this is also
triggering extra loop unrolling (haven't really explored why).

Interesting.  In that case, there may be something we should be doing
earlier.

I did have it in with the late calls at one point while testing but
took it out as the results were not great, although that was before I
had the previous patch from you.

Interesting...

>
>     }
>
>     OPT(nir_opt_algebraic_late);
> --
> 2.9.3
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20161230/2c818874/attachment-0001.html>