[Mesa-dev] [RFC 2/2] i965: switch fmul to increase chance of optimising it away

Jason Ekstrand jason at jlekstrand.net
Sat Dec 31 03:23:38 UTC 2016


On Dec 30, 2016 3:50 AM, "Timothy Arceri" <timothy.arceri at collabora.com>
wrote:

If one of the inputs to the multiplcation in ffma is the result of
an fmul there is a chance that we can reuse the result of that
fmul in other ffma calls if we do the multiplication in the right
order.

For example it is a fairly common pattern for shaders to do something
similar to this:

  const float a = 0.5;
  in vec4 b;
  in float c;

  ...

  b.x = b.x * c;
  b.y = b.y * c;

  ...

  b.x = b.x * a + a;
  b.y = b.y * a + a;

So by simply detecting that constant a is part of the multiplication
in ffma and switching it with previous fmul that updates b we end up
with:

  ...

  c = a * c;

  ...

  b.x = b.x * c + a;
  b.y = b.y * c + a;

shader-db results BDW:

total instructions in shared programs: 13065888 -> 13045434 (-0.16%)
instructions in affected programs: 2436228 -> 2415774 (-0.84%)
helped: 10261
HURT: 30


Nice!  Those are some impressive instruction count reductions.

I'm not sure what I think of the approach though.  We could probably also
do this in the ffma peephole itself.

total cycles in shared programs: 253619698 -> 253418728 (-0.08%)
cycles in affected programs: 141182838 -> 140981868 (-0.14%)
helped: 8853
HURT: 3162

total loops in shared programs: 2952 -> 2918 (-1.15%)
loops in affected programs: 66 -> 32 (-51.52%)
helped: 22
HURT: 0

total spills in shared programs: 15106 -> 14840 (-1.76%)
spills in affected programs: 8475 -> 8209 (-3.14%)
helped: 287
HURT: 31

total fills in shared programs: 20210 -> 19708 (-2.48%)
fills in affected programs: 12054 -> 11552 (-4.16%)
helped: 293
HURT: 28

LOST:   8
GAINED: 5

All the HURT besides an increase of a single instruction in a
yofrankie shader comes from deus-ex, however the helped
fills/spills/instructions far outways the HURT for other deus-ex
shaders.
---
 src/compiler/nir/nir_opt_algebraic.py | 1 +
 src/mesa/drivers/dri/i965/brw_nir.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py
b/src/compiler/nir/nir_opt_algebraic.py
index 982f8b2..b13e484 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -111,6 +111,7 @@ optimizations = [
    (('~ffma', a, b, 0.0), ('fmul', a, b)),
    (('ffma', a, 1.0, b), ('fadd', a, b)),
    (('ffma', 1.0, a, b), ('fadd', a, b)),
+   (('ffma', ('!fmul', a, b), '#c', d), ('ffma', a, ('fmul', c, b), d)),
    (('~flrp', a, b, 0.0), a),
    (('~flrp', a, b, 1.0), b),
    (('~flrp', a, a, b), a),
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c
b/src/mesa/drivers/dri/i965/brw_nir.c
index 6f37e97..7babc54 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -550,6 +550,7 @@ brw_postprocess_nir(nir_shader *nir, const struct
brw_compiler *compiler,
    if (devinfo->gen >= 6) {
       /* Try and fuse multiply-adds */
       OPT(brw_nir_opt_peephole_ffma);
+      nir = nir_optimize(nir, compiler, is_scalar);


Why not just put this optimization in the late bucket with the other
post-ffma optimizations.

    }

    OPT(nir_opt_algebraic_late);
--
2.9.3

_______________________________________________
mesa-dev mailing list
mesa-dev at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20161230/2a7c0b4b/attachment.html>


More information about the mesa-dev mailing list