[Mesa-dev] [PATCH v2] i965/nir: do int64 lowering before optimization

Mon Feb 5 13:40:31 UTC 2018

Otherwise loop unrolling will fail to see the actual cost of
the unrolling operations when the loop body contains 64-bit integer
instructions, and very specially when the divmod64 lowering applies,
since its lowering is quite expensive.

Without this change, some in-development CTS tests for int64
get stuck forever trying to register allocate a shader with
over 50K SSA values. The large number of SSA values is the result
of NIR first unrolling multiple seemingly simple loops that involve
int64 instructions, only to then lower these instructions to produce
a massive pile of code (due to the divmod64 lowering in the unrolled
instructions).

With this change, loop unrolling will see the loops with the int64
code already lowered and will realize that it is too expensive to
unroll.

v2: Run nir_algebraic first so we can hopefully get rid of some of
    the int64 instructions before we even attempt to lower them.
---

For reference, I captured execution times for the CTS tests that
raised the problem. This is with debug builds of Mesa and CTS so
they are not ideal, but I think they are sufficient to see the
imapact of the patch.

With this patch: 52s
With this v1:    56s
With master:     1m:38s (*)

(*) This is actually a significant improvement that has happened in
master since we sent the original patch. Originally, the tests would
just hang forever trying to compile.

 src/intel/compiler/brw_nir.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index dbddef0d04..287bd4908d 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -634,6 +634,18 @@ brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir)
 
    OPT(nir_split_var_copies);
 
+   /* Run opt_algebraic before int64 lowering so we can hopefully get rid
+    * of some int64 instructions.
+    */
+   OPT(nir_opt_algebraic);
+
+   /* Lower int64 instructions before nir_optimize so that loop unrolling
+    * sees their actual cost.
+    */
+   nir_lower_int64(nir, nir_lower_imul64 |
+                        nir_lower_isign64 |
+                        nir_lower_divmod64);
+
    nir = brw_nir_optimize(nir, compiler, is_scalar);
 
    if (is_scalar) {
@@ -661,10 +673,6 @@ brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir)
       brw_nir_no_indirect_mask(compiler, nir->info.stage);
    nir_lower_indirect_derefs(nir, indirect_mask);
 
-   nir_lower_int64(nir, nir_lower_imul64 |
-                        nir_lower_isign64 |
-                        nir_lower_divmod64);
-
    /* Get rid of split copies */
    nir = brw_nir_optimize(nir, compiler, is_scalar);
 
-- 
2.14.1