Mesa (main): intel/fs: Use OPT() for split_virtual_grfs

Sat Dec 18 02:05:02 UTC 2021

Module: Mesa
Branch: main
Commit: cf98a3cc19879d67d71023a4a85c16c2de560150
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=cf98a3cc19879d67d71023a4a85c16c2de560150

Author: Jason Ekstrand <jason at jlekstrand.net>
Date:   Tue Nov  9 14:38:48 2021 -0600

intel/fs: Use OPT() for split_virtual_grfs

Now that we're being conservative in the pass, it's easy to tell when it
makes progress and we can put it in the OPT() macro.  This way, we get
nice INTEL_DEBUG=optimizer dumps for it.  While we're here, fix the
header comment which is massively out-of-date.

Reviewed-by: Emma Anholt <emma at anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick at intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>

---

 src/intel/compiler/brw_fs.cpp | 41 +++++++++++++++++++++--------------------
 src/intel/compiler/brw_fs.h   |  2 +-
 2 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 924741e01f2..03733ce719c 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -2049,22 +2049,17 @@ fs_visitor::assign_gs_urb_setup()
 /**
  * Split large virtual GRFs into separate components if we can.
  *
- * This is mostly duplicated with what brw_fs_vector_splitting does,
- * but that's really conservative because it's afraid of doing
- * splitting that doesn't result in real progress after the rest of
- * the optimization phases, which would cause infinite looping in
- * optimization.  We can do it once here, safely.  This also has the
- * opportunity to split interpolated values, or maybe even uniforms,
- * which we don't have at the IR level.
- *
- * We want to split, because virtual GRFs are what we register
- * allocate and spill (due to contiguousness requirements for some
- * instructions), and they're what we naturally generate in the
- * codegen process, but most virtual GRFs don't actually need to be
- * contiguous sets of GRFs.  If we split, we'll end up with reduced
- * live intervals and better dead code elimination and coalescing.
+ * This pass aggressively splits VGRFs into as small a chunks as possible,
+ * down to single registers if it can.  If no VGRFs can be split, we return
+ * false so this pass can safely be used inside an optimization loop.  We
+ * want to split, because virtual GRFs are what we register allocate and
+ * spill (due to contiguousness requirements for some instructions), and
+ * they're what we naturally generate in the codegen process, but most
+ * virtual GRFs don't actually need to be contiguous sets of GRFs.  If we
+ * split, we'll end up with reduced live intervals and better dead code
+ * elimination and coalescing.
  */
-void
+bool
 fs_visitor::split_virtual_grfs()
 {
    /* Compact the register file so we eliminate dead vgrfs.  This
@@ -2180,8 +2175,11 @@ fs_visitor::split_virtual_grfs()
    }
    assert(reg == reg_count);
 
-   if (!has_splits)
+   bool progress;
+   if (!has_splits) {
+      progress = false;
       goto cleanup;
+   }
 
    foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
       if (inst->opcode == SHADER_OPCODE_UNDEF) {
@@ -2236,11 +2234,15 @@ fs_visitor::split_virtual_grfs()
    }
    invalidate_analysis(DEPENDENCY_INSTRUCTION_DETAIL | DEPENDENCY_VARIABLES);
 
+   progress = true;
+
 cleanup:
    delete[] split_points;
    delete[] vgrf_has_split;
    delete[] new_virtual_grf;
    delete[] new_reg_offset;
+
+   return progress;
 }
 
 /**
@@ -8280,9 +8282,6 @@ fs_visitor::optimize()
 
    validate();
 
-   split_virtual_grfs();
-   validate();
-
 #define OPT(pass, args...) ({                                           \
       pass_num++;                                                       \
       bool this_progress = pass(args);                                  \
@@ -8313,6 +8312,8 @@ fs_visitor::optimize()
    int iteration = 0;
    int pass_num = 0;
 
+   OPT(split_virtual_grfs);
+
    /* Before anything else, eliminate dead code.  The results of some NIR
     * instructions may effectively be calculated twice.  Once when the
     * instruction is encountered, and again when the user of that result is
@@ -8385,7 +8386,7 @@ fs_visitor::optimize()
    OPT(opt_redundant_halt);
 
    if (OPT(lower_load_payload)) {
-      split_virtual_grfs();
+      OPT(split_virtual_grfs);
 
       /* Lower 64 bit MOVs generated by payload lowering. */
       if (!devinfo->has_64bit_float && !devinfo->has_64bit_int)
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 2bf53440c9b..fe22fca12db 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -148,7 +148,7 @@ public:
    void assign_regs_trivial();
    void calculate_payload_ranges(int payload_node_count,
                                  int *payload_last_use_ip) const;
-   void split_virtual_grfs();
+   bool split_virtual_grfs();
    bool compact_virtual_grfs();
    void assign_constant_locations();
    bool get_pull_locs(const fs_reg &src, unsigned *out_surf_index,