[Mesa-dev] [PATCH 3/3] i965/nir: use vectorization for non-scalar stages

Wed Oct 17 18:33:53 UTC 2018

From: Connor Abbott <cwabbott0 at gmail.com>

Shader-db results on Haswell:

    total instructions in shared programs: 2180337 -> 2154080 (-1.20%)
    instructions in affected programs: 959766 -> 933509 (-2.74%)
    helped: 5653
    HURT: 2560

    total cycles in shared programs: 12339326 -> 12307102 (-0.26%)
    cycles in affected programs: 6102794 -> 6070570 (-0.53%)
    helped: 3838
    HURT: 4868

Most of the hurt programs seem to be because we generate extra MOV's due
to vectorizing things. For example, in
shaders/non-free/steam/anomaly-2/158.shader_test, this:

add(8)          g116<1>.xyF     g12<4,4,1>.xyyyF g1.4<0,4,1>.xyyyF { align16 NoDDClr 1Q };
add(8)          g117<1>.xyF     g12<4,4,1>.xyyyF g1.4<0,4,1>.zwwwF { align16 NoDDClr 1Q };
add(8)          g116<1>.zwF     g12<4,4,1>.xxxyF -g1.4<0,4,1>.xxxyF { align16 NoDDChk 1Q };
add(8)          g117<1>.zwF     g12<4,4,1>.xxxyF -g1.4<0,4,1>.zzzwF { align16 NoDDChk 1Q };

Turns into this:

add(8)          g13<1>F         g12<4,4,1>.xyxyF g1.4<0,4,1>F   { align16 1Q };
add(8)          g14<1>F         g12<4,4,1>.xyxyF -g1.4<0,4,1>F  { align16 1Q };
mov(8)          g116<1>.xyD     g13<4,4,1>.xyyyD                { align16 NoDDClr 1Q };
mov(8)          g117<1>.xyD     g13<4,4,1>.zwwwD                { align16 NoDDClr 1Q };
mov(8)          g116<1>.zwD     g14<4,4,1>.xxxyD                { align16 NoDDChk 1Q };
mov(8)          g117<1>.zwD     g14<4,4,1>.zzzwD                { align16 NoDDChk 1Q };

So we eliminated two add's, but then had to introduce four mov's to
transpose the result.  Some of the hurt is because vectorization is a bit
over-aggressive and we vectorize something when we should have left it
as a scalar and CSEd it.  Unfortunately, this is all really tricky to do
as it involves the interactions between many different components.
---
 src/intel/compiler/brw_nir.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 297845b89b7..564fd004a94 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -568,6 +568,12 @@ brw_nir_optimize(nir_shader *nir, const struct brw_compiler *compiler,
       OPT(nir_copy_prop);
       OPT(nir_opt_dce);
       OPT(nir_opt_cse);
+
+      if (!is_scalar) {
+         OPT(nir_opt_vectorize);
+         OPT(nir_copy_prop);
+      }
+
       OPT(nir_opt_peephole_select, 0);
       OPT(nir_opt_intrinsics);
       OPT(nir_opt_algebraic);
-- 
2.19.1