[Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.

Fri May 8 11:11:30 PDT 2015

On 05/08/2015 03:36 AM, Kenneth Graunke wrote:
> According to Glenn, shifts on R600 have 5x the throughput as multiplies.
> 
> Intel GPUs have strange integer multiplication restrictions - on most
> hardware, MUL actually only does a 32-bit x 16-bit multiply.  This
> means the arguments aren't commutative, which can limit our constant
> propagation options.  SHL has no such restrictions.
> 
> Shifting is probably reasonable on most people's hardware, so let's just
> do that.
> 
> i965 shader-db results (using NIR for VS):
> total instructions in shared programs: 7432587 -> 7388982 (-0.59%)
> instructions in affected programs:     1360411 -> 1316806 (-3.21%)
> helped:                                5772
> HURT:                                  0
> 
> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> Cc: mattst88 at gmail.com
> Cc: jason at jlekstrand.net
> ---
>  src/glsl/nir/nir_opt_algebraic.py | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> So...I found a bizarre issue with this patch.
> 
>    (('imul', 4, a), ('ishl', a, 2)),
> 
> totally optimizes things.  However...
> 
>    (('imul', a, 4), ('ishl', a, 2)),
> 
> doesn't seem to do anything, even though imul is commutative, and nir_search
> should totally handle that...
> 
>      ▄▄      ▄▄    ▄▄     ▄▄▄▄▄▄▄▄   ▄▄▄▄▄       ▄▄
>      ██      ██   ████    ▀▀▀██▀▀▀  █▀▀▀▀██      ██
>      ▀█▄ ██ ▄█▀   ████       ██         ▄█▀      ██
>       ██ ██ ██   ██  ██      ██       ▄██▀       ██
>       ███▀▀███   ██████      ██       ██         ▀▀
>       ███  ███  ▄██  ██▄     ██       ▄▄         ▄▄
>       ▀▀▀  ▀▀▀  ▀▀    ▀▀     ▀▀       ▀▀         ▀▀
> 
> If you know why, let me know, otherwise I may have to look into it when more
> awake.

I've noticed a couple other weird things that I have been unable to
understand.  Shaders like the one below end with fmul/ffma instaed of
flrp, for example.  I understand why that happens from GLSL IR
opt_algebraic, but it seems like nir_opt_algebraic should handle it.

[require]
GLSL >= 1.30

[vertex shader]
in vec4 v;
in vec2 tc_in;

out vec2 tc;

void main() {
    gl_Position = v;
    tc = tc_in;
}

[fragment shader]
in vec2 tc;

out vec4 color;

uniform sampler2D s;
uniform float a;
uniform vec3 base_color;

void main() {
    vec3 tex_color = texture(s, tc).xyz;

    color.xyz = (base_color * a) + (tex_color * (1.0 - a));
    color.a = 1.0;
}

> diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py
> index 400d60e..350471f 100644
> --- a/src/glsl/nir/nir_opt_algebraic.py
> +++ b/src/glsl/nir/nir_opt_algebraic.py
> @@ -247,6 +247,11 @@ late_optimizations = [
>     (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))),
>     (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
>     (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))),
> +
> +   # Multiplication by 4 comes up fairly often in indirect offset calculations.
> +   # Some GPUs have weird integer multiplication limitations, but shifts should work
> +   # equally well everywhere.
> +   (('imul', 4, a), ('ishl', a, 2)),

This should be conditionalized on whether the platform has native integers.

>  ]
>  
>  print nir_algebraic.AlgebraicPass("nir_opt_algebraic", optimizations).render()
>