[Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.

Fri May 8 03:36:27 PDT 2015

According to Glenn, shifts on R600 have 5x the throughput as multiplies.

Intel GPUs have strange integer multiplication restrictions - on most
hardware, MUL actually only does a 32-bit x 16-bit multiply.  This
means the arguments aren't commutative, which can limit our constant
propagation options.  SHL has no such restrictions.

Shifting is probably reasonable on most people's hardware, so let's just
do that.

i965 shader-db results (using NIR for VS):
total instructions in shared programs: 7432587 -> 7388982 (-0.59%)
instructions in affected programs:     1360411 -> 1316806 (-3.21%)
helped:                                5772
HURT:                                  0

Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
Cc: mattst88 at gmail.com
Cc: jason at jlekstrand.net
---
 src/glsl/nir/nir_opt_algebraic.py | 5 +++++
 1 file changed, 5 insertions(+)

So...I found a bizarre issue with this patch.

   (('imul', 4, a), ('ishl', a, 2)),

totally optimizes things.  However...

   (('imul', a, 4), ('ishl', a, 2)),

doesn't seem to do anything, even though imul is commutative, and nir_search
should totally handle that...

     ▄▄      ▄▄    ▄▄     ▄▄▄▄▄▄▄▄   ▄▄▄▄▄       ▄▄
     ██      ██   ████    ▀▀▀██▀▀▀  █▀▀▀▀██      ██
     ▀█▄ ██ ▄█▀   ████       ██         ▄█▀      ██
      ██ ██ ██   ██  ██      ██       ▄██▀       ██
      ███▀▀███   ██████      ██       ██         ▀▀
      ███  ███  ▄██  ██▄     ██       ▄▄         ▄▄
      ▀▀▀  ▀▀▀  ▀▀    ▀▀     ▀▀       ▀▀         ▀▀

If you know why, let me know, otherwise I may have to look into it when more
awake.

diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py
index 400d60e..350471f 100644
--- a/src/glsl/nir/nir_opt_algebraic.py
+++ b/src/glsl/nir/nir_opt_algebraic.py
@@ -247,6 +247,11 @@ late_optimizations = [
    (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))),
    (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
    (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))),
+
+   # Multiplication by 4 comes up fairly often in indirect offset calculations.
+   # Some GPUs have weird integer multiplication limitations, but shifts should work
+   # equally well everywhere.
+   (('imul', 4, a), ('ishl', a, 2)),
 ]
 
 print nir_algebraic.AlgebraicPass("nir_opt_algebraic", optimizations).render()
-- 
2.4.0