[Mesa-dev] [PATCH 11/12] nir: Reassociate open-coded flrp(1, b, c)

Ian Romanick idr at freedesktop.org
Sat Aug 25 05:52:16 UTC 2018


From: Ian Romanick <ian.d.romanick at intel.com>

In a previous verion of this patch, Jason commented,

   "Re-associating based on whether or not something has a constant
   value of 1.0 seems a bit sneaky.  I think it's well within the rules
   but it seems like something that could bite you."

That is possibly true.  The reassociation will generate different
results if fabs(b) >= 2**24 and fabs(c) < 0.5.  The delta increases as
fabs(c) approaches 0.

However, i965 has done this same reassociation indirectly for years.
We would previously allow nir_op_flrp on all pre-Gen11 hardware even
though Gen4 and Gen5 do not have a LRP instruction.  Optimizations in
nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a,
b, c).  On Gen7+, the hardware performs the same arithmetic as
a(1-c)+bc.  Gen6 seems to implement LRP as a+c(b-a).  On Gen4 and
Gen5, we would lower LRP to a sequence of instructions that implement
a(1-c)+bc.  The lowering happens after all constant folding, so we
would litterally generate a 1+(-1) instruction sequence in this
scenario: one instruction to load either 1 or -1 in a register, and
another instruction to add either -1 or 1 to it.

This patch just cuts out the middle man.  Do the reassociation that
we've always done, but do it explicitly at a time when we can benefit
from other optimizations.

A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a,
±1, c) differently" are restored by this patch.  This includes a few
shaders in ET:QW.

I tried a similar thing for open-coded flrp(-1, b, c), and it hurt
instructions on 35 shaders for ILK without helping any.  The helped /
hurt cycles was about even.

No changes on any other Intel platforms.

Iron Lake
total instructions in shared programs: 7735001 -> 7727356 (-0.10%)
instructions in affected programs: 1094100 -> 1086455 (-0.70%)
helped: 3281
HURT: 64
helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2
helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38%
95% mean confidence interval for instructions value: -2.32 -2.25
95% mean confidence interval for instructions %-change: -1.16% -1.08%
Instructions are helped.

total cycles in shared programs: 178021114 -> 177982922 (-0.02%)
cycles in affected programs: 20360622 -> 20322430 (-0.19%)
helped: 3022
HURT: 489
helped stats (abs) min: 2 max: 142 x̄: 13.33 x̃: 12
helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24%
HURT stats (abs)   min: 2 max: 328 x̄: 4.26 x̃: 4
HURT stats (rel)   min: 0.02% max: 1.55% x̄: 0.14% x̃: 0.11%
95% mean confidence interval for cycles value: -11.26 -10.50
95% mean confidence interval for cycles %-change: -0.45% -0.41%
Cycles are helped.

LOST:   7
GAINED: 0

GM45
total instructions in shared programs: 4762494 -> 4758409 (-0.09%)
instructions in affected programs: 628390 -> 624305 (-0.65%)
helped: 1751
HURT: 32
helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2
helped stats (rel) min: 0.12% max: 11.11% x̄: 1.08% x̃: 0.73%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.24% max: 0.60% x̄: 0.39% x̃: 0.37%
95% mean confidence interval for instructions value: -2.34 -2.24
95% mean confidence interval for instructions %-change: -1.11% -1.01%
Instructions are helped.

total cycles in shared programs: 121921660 -> 121897144 (-0.02%)
cycles in affected programs: 13682798 -> 13658282 (-0.18%)
helped: 1683
HURT: 360
helped stats (abs) min: 2 max: 142 x̄: 15.48 x̃: 14
helped stats (rel) min: 0.01% max: 6.37% x̄: 0.51% x̃: 0.22%
HURT stats (abs)   min: 2 max: 328 x̄: 4.25 x̃: 2
HURT stats (rel)   min: 0.02% max: 1.55% x̄: 0.14% x̃: 0.11%
95% mean confidence interval for cycles value: -12.60 -11.40
95% mean confidence interval for cycles %-change: -0.43% -0.37%
Cycles are helped.

LOST:   7
GAINED: 7

Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
---
 src/compiler/nir/nir_opt_algebraic.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py
index 1db6d7a2bfe..60b97e14b1a 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -890,6 +890,9 @@ late_optimizations = [
    (('b2f(is_used_more_than_once)', ('inot', a)), ('bcsel', a, 0.0, 1.0)),
    (('fneg(is_used_more_than_once)', ('b2f', ('inot', a))), ('bcsel', a, -0.0, -1.0)),
 
+   (('~fadd at 32', 1.0, ('fmul(is_used_once)', c , ('fadd', b, -1.0 ))), ('fadd', ('fadd', 1.0, ('fneg', c)), ('fmul', b, c)), 'options->lower_flrp32'),
+   (('~fadd at 64', 1.0, ('fmul(is_used_once)', c , ('fadd', b, -1.0 ))), ('fadd', ('fadd', 1.0, ('fneg', c)), ('fmul', b, c)), 'options->lower_flrp64'),
+
    # we do these late so that we don't get in the way of creating ffmas
    (('fmin', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', b)), ('fadd', c, ('fmin', a, b))),
    (('fmax', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', b)), ('fadd', c, ('fmax', a, b))),
-- 
2.14.4



More information about the mesa-dev mailing list