[Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.
Jason Ekstrand
jason at jlekstrand.net
Fri May 8 10:05:51 PDT 2015
On Fri, May 8, 2015 at 3:36 AM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> According to Glenn, shifts on R600 have 5x the throughput as multiplies.
>
> Intel GPUs have strange integer multiplication restrictions - on most
> hardware, MUL actually only does a 32-bit x 16-bit multiply. This
> means the arguments aren't commutative, which can limit our constant
> propagation options. SHL has no such restrictions.
>
> Shifting is probably reasonable on most people's hardware, so let's just
> do that.
>
> i965 shader-db results (using NIR for VS):
> total instructions in shared programs: 7432587 -> 7388982 (-0.59%)
> instructions in affected programs: 1360411 -> 1316806 (-3.21%)
> helped: 5772
> HURT: 0
>
> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> Cc: mattst88 at gmail.com
> Cc: jason at jlekstrand.net
> ---
> src/glsl/nir/nir_opt_algebraic.py | 5 +++++
> 1 file changed, 5 insertions(+)
>
> So...I found a bizarre issue with this patch.
>
> (('imul', 4, a), ('ishl', a, 2)),
>
> totally optimizes things. However...
>
> (('imul', a, 4), ('ishl', a, 2)),
>
> doesn't seem to do anything, even though imul is commutative, and nir_search
> should totally handle that...
>
> ▄▄ ▄▄ ▄▄ ▄▄▄▄▄▄▄▄ ▄▄▄▄▄ ▄▄
> ██ ██ ████ ▀▀▀██▀▀▀ █▀▀▀▀██ ██
> ▀█▄ ██ ▄█▀ ████ ██ ▄█▀ ██
> ██ ██ ██ ██ ██ ██ ▄██▀ ██
> ███▀▀███ ██████ ██ ██ ▀▀
> ███ ███ ▄██ ██▄ ██ ▄▄ ▄▄
> ▀▀▀ ▀▀▀ ▀▀ ▀▀ ▀▀ ▀▀ ▀▀
>
> If you know why, let me know, otherwise I may have to look into it when more
> awake.
I figured it out and I have a patch. Unfortunately, it regresses a
few programs and looses 8 SIMD8 programs so I'm doing some more
investigation. I'll send it out soon.
> diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py
> index 400d60e..350471f 100644
> --- a/src/glsl/nir/nir_opt_algebraic.py
> +++ b/src/glsl/nir/nir_opt_algebraic.py
> @@ -247,6 +247,11 @@ late_optimizations = [
> (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))),
> (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
> (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))),
> +
> + # Multiplication by 4 comes up fairly often in indirect offset calculations.
> + # Some GPUs have weird integer multiplication limitations, but shifts should work
> + # equally well everywhere.
> + (('imul', 4, a), ('ishl', a, 2)),
> ]
>
> print nir_algebraic.AlgebraicPass("nir_opt_algebraic", optimizations).render()
> --
> 2.4.0
>
More information about the mesa-dev
mailing list