[Mesa-dev] [PATCH 5/5] nir: Lower flrp differently when the alpha value is reused

Mon Aug 13 23:34:21 UTC 2018

From: Ian Romanick <ian.d.romanick at intel.com>

For some reason, if I did not move the regular lowering to late
optimizations, the new lowering would never trigger.  This also means
that the fsub lowering had to be added to late optimizations, and this
requires "intel/compiler: Repeat nir_opt_algebraic_late until no more
progress".

The loops removed by this patch are the same loops added by
"intel/compiler: Don't emit flrp for Gen4 or Gen5"

I am CC'ing people who are responsible for drivers that set lower_flrp32
as this patch will likely affect shader-db results for those drivers.

No changes on any Gen6+ platform.

Iron Lake
total instructions in shared programs: 7730019 -> 7731893 (0.02%)
instructions in affected programs: 139980 -> 141854 (1.34%)
helped: 262
HURT: 329
helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.11% max: 4.69% x̄: 1.70% x̃: 1.30%
HURT stats (abs)   min: 1 max: 19 x̄: 8.09 x̃: 7
HURT stats (rel)   min: 0.32% max: 23.53% x̄: 5.10% x̃: 4.74%
95% mean confidence interval for instructions value: 2.62 3.72
95% mean confidence interval for instructions %-change: 1.73% 2.44%
Instructions are HURT.

total cycles in shared programs: 177866190 -> 177851638 (<.01%)
cycles in affected programs: 18970354 -> 18955802 (-0.08%)
helped: 1700
HURT: 962
helped stats (abs) min: 2 max: 70 x̄: 17.40 x̃: 16
helped stats (rel) min: <.01% max: 3.36% x̄: 0.37% x̃: 0.23%
HURT stats (abs)   min: 2 max: 114 x̄: 15.62 x̃: 6
HURT stats (rel)   min: <.01% max: 10.50% x̄: 0.98% x̃: 0.39%
95% mean confidence interval for cycles value: -6.33 -4.60
95% mean confidence interval for cycles %-change: 0.07% 0.16%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total loops in shared programs: 854 -> 850 (-0.47%)
loops in affected programs: 4 -> 0
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.

GM45
total instructions in shared programs: 4769335 -> 4770019 (0.01%)
instructions in affected programs: 90821 -> 91505 (0.75%)
helped: 219
HURT: 167
helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.11% max: 4.35% x̄: 1.70% x̃: 1.30%
HURT stats (abs)   min: 1 max: 19 x̄: 8.02 x̃: 7
HURT stats (rel)   min: 0.32% max: 22.86% x̄: 4.95% x̃: 4.57%
95% mean confidence interval for instructions value: 1.12 2.43
95% mean confidence interval for instructions %-change: 0.77% 1.59%
Instructions are HURT.

total cycles in shared programs: 121980262 -> 121970888 (<.01%)
cycles in affected programs: 12861602 -> 12852228 (-0.07%)
helped: 1040
HURT: 492
helped stats (abs) min: 2 max: 70 x̄: 17.65 x̃: 16
helped stats (rel) min: <.01% max: 3.36% x̄: 0.32% x̃: 0.21%
HURT stats (abs)   min: 2 max: 114 x̄: 18.26 x̃: 6
HURT stats (rel)   min: <.01% max: 10.50% x̄: 1.00% x̃: 0.35%
95% mean confidence interval for cycles value: -7.34 -4.89
95% mean confidence interval for cycles %-change: 0.05% 0.17%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total loops in shared programs: 631 -> 629 (-0.32%)
loops in affected programs: 2 -> 0
helped: 2
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
Cc: Marek Olšák <marek.olsak at amd.com>
Cc: Rob Clark <robdclark at gmail.com>
Cc: Eric Anholt <eric at anholt.net>
---
 src/compiler/nir/nir_opt_algebraic.py | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py
index f11a987c462..54f901e6cad 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -120,8 +120,6 @@ optimizations = [
    (('flrp at 64', 1.0, b, c), ('fadd', ('fsub', 1.0, c), ('fmul', b, c)), 'options->lower_flrp64'),
    (('flrp at 32', a, 1.0, c), ('fadd', a, ('fmul', c, ('fsub', 1.0, a))), 'options->lower_flrp32'),
    (('flrp at 64', a, 1.0, c), ('fadd', a, ('fmul', c, ('fsub', 1.0, a))), 'options->lower_flrp64'),
-   (('flrp at 32', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 'options->lower_flrp32'),
-   (('flrp at 64', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 'options->lower_flrp64'),
    (('ffract', a), ('fsub', a, ('ffloor', a)), 'options->lower_ffract'),
    (('~fadd', ('fmul', a, ('fadd', 1.0, ('fneg', ('b2f', c)))), ('fmul', b, ('b2f', c))), ('bcsel', c, b, a), 'options->lower_flrp32'),
    (('~fadd at 32', ('fmul', a, ('fadd', 1.0, ('fneg',         c ))), ('fmul', b,         c )), ('flrp', a, b, c), '!options->lower_flrp32'),
@@ -134,6 +132,30 @@ optimizations = [
    (('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'),
    (('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'),
 
+   # flrp(a, b, c) * flrp(d, e, c)
+   # (a(1-c) + bc)) * (d(1-c) + ec)
+   #
+   # Since (1-d) is common, it is one operation less than the other
+   #  expansion.
+   (('fmul', ('flrp at 32', a, b, c), ('flrp at 32', d, 'e', c)),
+    ('fmul', ('fadd', ('fmul', a, ('fsub', 1.0, c)), ('fmul', 'b', c)),
+             ('fadd', ('fmul', d, ('fsub', 1.0, c)), ('fmul', 'e', c))),
+    'options->lower_flrp32'),
+   (('fmul', ('flrp at 64', a, b, c), ('flrp at 64', d, 'e', c)),
+    ('fmul', ('fadd', ('fmul', a, ('fsub', 1.0, c)), ('fmul', 'b', c)),
+             ('fadd', ('fmul', d, ('fsub', 1.0, c)), ('fmul', 'e', c))),
+    'options->lower_flrp64'),
+
+   # (f * flrp(a, b, c)) * flrp(d, e, c)
+   (('fmul', ('fmul', 'f', ('flrp at 32', a, b, c)), ('flrp at 32', d, 'e', c)),
+    ('fmul', ('fmul', 'f', ('fadd', ('fmul', a, ('fsub', 1.0, c)), ('fmul', 'b', c))),
+             ('fadd', ('fmul', d, ('fsub', 1.0, c)), ('fmul', 'e', c))),
+    'options->lower_flrp32'),
+   (('fmul', ('fmul', 'f', ('flrp at 64', a, b, c)), ('flrp at 64', d, 'e', c)),
+    ('fmul', ('fmul', 'f', ('fadd', ('fmul', a, ('fsub', 1.0, c)), ('fmul', 'b', c))),
+             ('fadd', ('fmul', d, ('fsub', 1.0, c)), ('fmul', 'e', c))),
+    'options->lower_flrp64'),
+
    (('fdot4', ('vec4', a, b,   c,   1.0), d), ('fdph',  ('vec3', a, b, c), d)),
    (('fdot4', ('vec4', a, 0.0, 0.0, 0.0), b), ('fmul', a, b)),
    (('fdot4', ('vec4', a, b,   0.0, 0.0), c), ('fdot2', ('vec2', a, b), c)),
@@ -887,6 +909,10 @@ late_optimizations = [
 
    # Lowered for backends without a dedicated b2f instruction
    (('b2f at 32', a), ('iand', a, 1.0), 'options->lower_b2f'),
+
+   (('flrp at 32', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 'options->lower_flrp32'),
+   (('flrp at 64', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 'options->lower_flrp64'),
+   (('fsub', a, b), ('fadd', a, ('fneg', b)), 'options->lower_sub'),
 ]
 
 print(nir_algebraic.AlgebraicPass("nir_opt_algebraic", optimizations).render())
-- 
2.14.4