Mesa (master): i965/fs: Strip trailing constant zeroes in sample messages

Neil Roberts nroberts at kemper.freedesktop.org
Fri May 1 10:51:41 UTC 2015


Module: Mesa
Branch: master
Commit: 02e9773bc8526f64e4d79e3d9ac11f49882c022f
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=02e9773bc8526f64e4d79e3d9ac11f49882c022f

Author: Neil Roberts <neil at linux.intel.com>
Date:   Thu Apr 23 16:56:53 2015 -0700

i965/fs: Strip trailing constant zeroes in sample messages

If a send message is emitted with a message length that is less than
required for the message then the remaining parameters default to
zero. We can take advantage of this to save a register when a shader
passes constant zeroes as the final coordinates to the sample
function.

I think this might be useful for GLES applications that are using 2D
textures to simulate 1D textures.

On Skylake it will be useful for shaders that do
texelFetch(tex,something,0) which I think is fairly common. This helps
more on Skylake because in that case the order of the instruction
operands are u,v,lod,r which is good for 2D textures whereas before
they were u,lod,v,r which is only good for 1D textures.

On Haswell:
total instructions in shared programs: 8535730 -> 8533261 (-0.03%)
instructions in affected programs:     236968 -> 234499 (-1.04%)
helped:                                1174

On Skylake:
total instructions in shared programs: 10345646 -> 10341237 (-0.04%)
instructions in affected programs:     293011 -> 288602 (-1.50%)
helped:                                1218

Reviewed-by: Matt Turner <mattst88 at gmail.com>

v2: Applied suggestions by Kenneth Graunke:
    - Only apply on Gen5+
    - Apply to all texture opcodes, not just TEX and TXF.
    Moved the optimisation into the loop as suggested by Matt Turner.
    Fix the array index when there is a header.

---

 src/mesa/drivers/dri/i965/brw_fs.cpp |   49 ++++++++++++++++++++++++++++++++++
 src/mesa/drivers/dri/i965/brw_fs.h   |    1 +
 2 files changed, 50 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
index a648323..ed86e0b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2536,6 +2536,54 @@ fs_visitor::opt_algebraic()
 }
 
 /**
+ * Optimize sample messages that have constant zero values for the trailing
+ * texture coordinates. We can just reduce the message length for these
+ * instructions instead of reserving a register for it. Trailing parameters
+ * that aren't sent default to zero anyway. This will cause the dead code
+ * eliminator to remove the MOV instruction that would otherwise be emitted to
+ * set up the zero value.
+ */
+bool
+fs_visitor::opt_zero_samples()
+{
+   /* Gen4 infers the texturing opcode based on the message length so we can't
+    * change it.
+    */
+   if (devinfo->gen < 5)
+      return false;
+
+   bool progress = false;
+
+   foreach_block_and_inst(block, fs_inst, inst, cfg) {
+      if (!inst->is_tex())
+         continue;
+
+      fs_inst *load_payload = (fs_inst *) inst->prev;
+
+      if (load_payload->is_head_sentinel() ||
+          load_payload->opcode != SHADER_OPCODE_LOAD_PAYLOAD)
+         continue;
+
+      /* We don't want to remove the message header. Removing all of the
+       * parameters is avoided because it seems to cause a GPU hang but I
+       * can't find any documentation indicating that this is expected.
+       */
+      while (inst->mlen > inst->header_present + dispatch_width / 8 &&
+             load_payload->src[(inst->mlen - inst->header_present) /
+                               (dispatch_width / 8) +
+                               inst->header_present - 1].is_zero()) {
+         inst->mlen -= dispatch_width / 8;
+         progress = true;
+      }
+   }
+
+   if (progress)
+      invalidate_live_intervals();
+
+   return progress;
+}
+
+/**
  * Optimize sample messages which are followed by the final RT write.
  *
  * CHV, and GEN9+ can mark a texturing SEND instruction with EOT to have its
@@ -3818,6 +3866,7 @@ fs_visitor::optimize()
       OPT(opt_register_renaming);
       OPT(opt_redundant_discard_jumps);
       OPT(opt_saturate_propagation);
+      OPT(opt_zero_samples);
       OPT(register_coalesce);
       OPT(compute_to_mrf);
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
index 4e17d44..6200deb 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -324,6 +324,7 @@ public:
    bool opt_peephole_predicated_break();
    bool opt_saturate_propagation();
    bool opt_cmod_propagation();
+   bool opt_zero_samples();
    void emit_bool_to_cond_code(ir_rvalue *condition);
    void emit_bool_to_cond_code_of_reg(ir_expression *expr, fs_reg op[3]);
    void emit_if_gen6(ir_if *ir);




More information about the mesa-commit mailing list