[Mesa-dev] [PATCH 8/8] i965/nir: Use GCM and GVN in the first run of nir_optimize

Wed Jan 18 02:38:42 UTC 2017

Shader-db results on Sky Lake (Excluding Deus Ex: Mankind Divided):

   total instructions in shared programs: 11724449 -> 11706852 (-0.15%)
   instructions in affected programs: 1218950 -> 1201353 (-1.44%)
   helped: 2562
   HURT: 1208

   total cycles in shared programs: 109388578 -> 108934482 (-0.42%)
   cycles in affected programs: 50640234 -> 50186138 (-0.90%)
   helped: 16097
   HURT: 13858

   total loops in shared programs: 1828 -> 1824 (-0.22%)
   loops in affected programs: 8 -> 4 (-50.00%)
   helped: 4
   HURT: 0

   total spills in shared programs: 1930 -> 1926 (-0.21%)
   spills in affected programs: 1054 -> 1050 (-0.38%)
   helped: 4
   HURT: 5

   total fills in shared programs: 3651 -> 3635 (-0.44%)
   fills in affected programs: 2594 -> 2578 (-0.62%)
   helped: 4
   HURT: 5

   LOST:   15
   GAINED: 1

Some analysis was done of the hurt programs.  The vast majority of the
hurt programs were only hurt by 2-3 instructions.  Based on a very
sparse random sampling, most of those appear to be hurt either because
of slightly different MOVs or because they have a single block with a
discard and GCM moved the discard higher in the shader which cause us to
need to emit a HALT which we didn't emit before.  If the case with the
discard should should actually be an improvement most of the time in
spite of being more instructions.  There were also a few larger shaders
that were hurt by around 10 instructions.

On the helped end of things, there were 74 shaders helped by over 10%
and most of those were on the order of 500 instructions.

Spilling seems to be a wash.  Some stuff is helped by around 10% and
others are hurt by about the same amount.  Each hurt application is also
helped by about the same amount.  The only app that's pure loss is
orbital explorer...

With "Deus Ex: Mankind Divided", the results aren't so good.  Running
GCM late helps substantially while running GCM early (like we do here)
hurts it pretty badly.  Given that running early is better for basically
everything else (I ran shader-db both ways), I'm recommending we go
early for now and try to figure out how to fix deus ex.
---
 src/mesa/drivers/dri/i965/brw_nir.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c
index 999e1d2..a7038ef 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -455,7 +455,7 @@ brw_nir_lower_cs_shared(nir_shader *nir)
 
 static nir_shader *
 nir_optimize(nir_shader *nir, const struct brw_compiler *compiler,
-             bool is_scalar)
+             bool run_gcm, bool is_scalar)
 {
    nir_variable_mode indirect_mask = 0;
    if (compiler->glsl_compiler_options[nir->stage].EmitNoIndirectInput)
@@ -501,6 +501,17 @@ nir_optimize(nir_shader *nir, const struct brw_compiler *compiler,
          OPT(nir_opt_loop_unroll, indirect_mask);
       }
       OPT(nir_opt_remove_phis);
+
+      /* We only want to run global code motion in the early stages of
+       * compilation.  In particular, we want to run it before we lower
+       * indirects away.  If we run GCM after indirect lowering, all of the
+       * loads stop being dependent on the loop and GCM pulls them out.  This
+       * can lead to massive register pressure problems for shaders with loops
+       * we can't unroll.
+       */
+      if (run_gcm)
+         OPT(nir_opt_gcm, true);
+
       OPT(nir_opt_undef);
       OPT_V(nir_lower_doubles, nir_lower_drcp |
                                nir_lower_dsqrt |
@@ -557,7 +568,7 @@ brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir)
 
    OPT(nir_split_var_copies);
 
-   nir = nir_optimize(nir, compiler, is_scalar);
+   nir = nir_optimize(nir, compiler, true, is_scalar);
 
    if (is_scalar) {
       OPT_V(nir_lower_load_const_to_scalar);
@@ -579,7 +590,7 @@ brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir)
    nir_lower_indirect_derefs(nir, indirect_mask);
 
    /* Get rid of split copies */
-   nir = nir_optimize(nir, compiler, is_scalar);
+   nir = nir_optimize(nir, compiler, false, is_scalar);
 
    OPT(nir_remove_dead_variables, nir_var_local);
 
@@ -604,13 +615,12 @@ brw_postprocess_nir(nir_shader *nir, const struct brw_compiler *compiler,
    bool progress; /* Written by OPT and OPT_V */
    (void)progress;
 
-
    do {
       progress = false;
       OPT(nir_opt_algebraic_before_ffma);
    } while (progress);
 
-   nir = nir_optimize(nir, compiler, is_scalar);
+   nir = nir_optimize(nir, compiler, false, is_scalar);
 
    if (devinfo->gen >= 6) {
       /* Try and fuse multiply-adds */
@@ -703,7 +713,7 @@ brw_nir_apply_sampler_key(nir_shader *nir,
 
    if (nir_lower_tex(nir, &tex_options)) {
       nir_validate_shader(nir);
-      nir = nir_optimize(nir, compiler, is_scalar);
+      nir = nir_optimize(nir, compiler, false, is_scalar);
    }
 
    return nir;
-- 
2.5.0.400.gff86faf