[Mesa-dev] [PATCH] i965: Enable the HiZ RAW Stall Optimization on Gen8.

Kenneth Graunke kenneth at whitecape.org
Sat Jan 10 16:46:41 PST 2015


This is an important optimization for avoiding read-after-write (RAW)
stalls in the HiZ buffer.  Certain workloads would run very slowly with
HiZ enabled, but run much faster with the "hiz=false" driconf option.
With this patch, they run at full speed even with HiZ.

Improves performance in OglVSInstancing by 3.2x on Iris Pro 6200.

Thanks to Jesse Barnes for finding this missing bit!

One could argue that the kernel should be setting this register, but
it's part of our hardware context, so we should probably set it
ourselves.  It's easy enough and only likely to affect 3D.

It's not entirely clear whether this is necessary on Cherryview.  Mine
had the optimization enabled by default, but the documentation seems to
indicate that some Cherryview systems may have it disabled by default.
At any rate, it's harmless to enable it - it just might be redundant.

Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
Cc: Jesse Barnes <jbarnes at virtuousgeek.org>
---
 src/mesa/drivers/dri/i965/brw_state_upload.c | 17 +++++++++++++++++
 src/mesa/drivers/dri/i965/intel_reg.h        |  6 ++++++
 2 files changed, 23 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c
index 7a25ef5..c8a04b3 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -349,6 +349,23 @@ brw_upload_initial_gpu_state(struct brw_context *brw)
    if (brw->gen >= 8) {
       gen8_emit_3dstate_sample_pattern(brw);
    }
+
+   /* From the Haswell PRM, Command Reference: Registers, CACHE_MODE_0, bit 2:
+    * "The Hierarchical Z RAW Stall Optimization allows non-overlapping
+    * polygons in the same 8x4 pixel/sample area to be processed without
+    * stalling waiting for the earlier ones to write to Hierarchical Z buffer."
+    *
+    * On Haswell, the kernel sets this for us (and we can't easily LRI).
+    * On Broadwell, it doesn't, but we can easily do it ourselves via an LRI.
+    * On Skylake, this is enabled by default so we don't need to change it.
+    */
+   if (brw->gen == 8) {
+      BEGIN_BATCH(3);
+      OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
+      OUT_BATCH(GEN7_CACHE_MODE_0);
+      OUT_BATCH(HIZ_RAW_STALL_OPT_ENABLE);
+      ADVANCE_BATCH();
+   }
 }
 
 void brw_init_state( struct brw_context *brw )
diff --git a/src/mesa/drivers/dri/i965/intel_reg.h b/src/mesa/drivers/dri/i965/intel_reg.h
index 5ac0180..cdeb8f3 100644
--- a/src/mesa/drivers/dri/i965/intel_reg.h
+++ b/src/mesa/drivers/dri/i965/intel_reg.h
@@ -25,6 +25,9 @@
  *
  **************************************************************************/
 
+#define _MASKED_BIT_ENABLE(a)  ((a) | ((a) << 16))
+#define _MASKED_BIT_DISABLE(a) ((a) << 16)
+
 #define CMD_MI				(0x0 << 29)
 #define CMD_2D				(0x2 << 29)
 #define CMD_3D				(0x3 << 29)
@@ -139,6 +142,9 @@
 #define GEN7_3DPRIM_START_INSTANCE      0x243C
 #define GEN7_3DPRIM_BASE_VERTEX         0x2440
 
+#define GEN7_CACHE_MODE_0               0x7000
+# define HIZ_RAW_STALL_OPT_ENABLE       _MASKED_BIT_DISABLE(1 << 2)
+
 #define GEN7_CACHE_MODE_1               0x7004
 # define GEN8_HIZ_NP_PMA_FIX_ENABLE        (1 << 11)
 # define GEN8_HIZ_NP_EARLY_Z_FAILS_DISABLE (1 << 13)
-- 
2.2.1



More information about the mesa-dev mailing list