[Mesa-dev] [PATCH 3/3] i965/nir: use vectorization for non-scalar stages
Ian Romanick
idr at freedesktop.org
Thu Oct 18 19:49:04 UTC 2018
On 10/17/2018 11:33 AM, Jason Ekstrand wrote:
> From: Connor Abbott <cwabbott0 at gmail.com>
>
> Shader-db results on Haswell:
>
> total instructions in shared programs: 2180337 -> 2154080 (-1.20%)
> instructions in affected programs: 959766 -> 933509 (-2.74%)
> helped: 5653
> HURT: 2560
>
> total cycles in shared programs: 12339326 -> 12307102 (-0.26%)
> cycles in affected programs: 6102794 -> 6070570 (-0.53%)
> helped: 3838
> HURT: 4868
In cases like this, the extra statistics generated by my extra changes
to report.py can be informative. Give me a few minutes, and I'll gather
that data.
> Most of the hurt programs seem to be because we generate extra MOV's due
> to vectorizing things. For example, in
> shaders/non-free/steam/anomaly-2/158.shader_test, this:
>
> add(8) g116<1>.xyF g12<4,4,1>.xyyyF g1.4<0,4,1>.xyyyF { align16 NoDDClr 1Q };
> add(8) g117<1>.xyF g12<4,4,1>.xyyyF g1.4<0,4,1>.zwwwF { align16 NoDDClr 1Q };
> add(8) g116<1>.zwF g12<4,4,1>.xxxyF -g1.4<0,4,1>.xxxyF { align16 NoDDChk 1Q };
> add(8) g117<1>.zwF g12<4,4,1>.xxxyF -g1.4<0,4,1>.zzzwF { align16 NoDDChk 1Q };
>
> Turns into this:
>
> add(8) g13<1>F g12<4,4,1>.xyxyF g1.4<0,4,1>F { align16 1Q };
> add(8) g14<1>F g12<4,4,1>.xyxyF -g1.4<0,4,1>F { align16 1Q };
> mov(8) g116<1>.xyD g13<4,4,1>.xyyyD { align16 NoDDClr 1Q };
> mov(8) g117<1>.xyD g13<4,4,1>.zwwwD { align16 NoDDClr 1Q };
> mov(8) g116<1>.zwD g14<4,4,1>.xxxyD { align16 NoDDChk 1Q };
> mov(8) g117<1>.zwD g14<4,4,1>.zzzwD { align16 NoDDChk 1Q };
>
> So we eliminated two add's, but then had to introduce four mov's to
> transpose the result. Some of the hurt is because vectorization is a bit
> over-aggressive and we vectorize something when we should have left it
> as a scalar and CSEd it. Unfortunately, this is all really tricky to do
> as it involves the interactions between many different components.
This seems to me like vectorization should be done later in the
optimization pipeline. I would have guessed that it would go after the
regular optimization loop. Did you try calling it from other places to
see the effects?
> ---
> src/intel/compiler/brw_nir.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
> index 297845b89b7..564fd004a94 100644
> --- a/src/intel/compiler/brw_nir.c
> +++ b/src/intel/compiler/brw_nir.c
> @@ -568,6 +568,12 @@ brw_nir_optimize(nir_shader *nir, const struct brw_compiler *compiler,
> OPT(nir_copy_prop);
> OPT(nir_opt_dce);
> OPT(nir_opt_cse);
> +
> + if (!is_scalar) {
> + OPT(nir_opt_vectorize);
> + OPT(nir_copy_prop);
> + }
> +
> OPT(nir_opt_peephole_select, 0);
> OPT(nir_opt_intrinsics);
> OPT(nir_opt_algebraic);
>
More information about the mesa-dev
mailing list