[Mesa-dev] [PATCH] intel/compiler/gen9: Pixel shader header only workaround

Topi Pohjolainen topi.pohjolainen at gmail.com
Wed Oct 25 17:37:37 UTC 2017

Fixes intermittent GPU hangs on Broxton with an Intel internal
test case.

There are plenty of similar fragment shaders in piglit that do
not use any varyings and any uniforms. According to the
documentation special timing is needed between pipeline stages.
Apparently we just don't hit that with piglit. Even with the
failing test case one doesn't always get the hang.

Moreover, according to the error states the hang happens
significantly later than the execution of the problematic shader.
There are multiple render cycles (primitive submissions) in between.
I've also seen error states where the ACTHD points outside the
batch. Almost as if the hardware writes somewhere that gets used
later on. That would also explain why piglit doesn't suffer from
this - most tests kick off one render cycle and any corruption
is left unseen.

v2 (Ken): Instead of enabling push constants, enable one of the
          inputs (PSIZ).
v3 (Ken, Jason): Use LAYER instead making vulkan emit_3dstate_sbe()

CC: Kenneth Graunke <kenneth at whitecape.org>
CC: Jason Ekstrand <jason at jlekstrand.net>
CC: Eero Tamminen <eero.t.tamminen at intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
 src/intel/compiler/brw_fs.cpp | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 30e8841242..2c6dc1e5a0 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -6164,6 +6164,31 @@ fs_visitor::run_gs()
    return !failed;
+/* From the SKL PRM, Volume 16, Workarounds:
+ *
+ *   0877  3D   Pixel Shader Hang possible when pixel shader dispatched with
+ *              only header phases (R0-R2)
+ *
+ *   WA: Enable a non-header phase (e.g. push constant) when dispatch would
+ *       have been header only.
+ *
+ * Instead of enabling push constants one can alternatively enable one of the
+ * inputs. Here one simply chooses point size which shouldn't impose much
+ * overhead.
+ */
+static void
+gen9_ps_header_only_workaround(struct brw_wm_prog_data *wm_prog_data)
+   if (wm_prog_data->num_varying_inputs)
+      return;
+   if (wm_prog_data->base.curb_read_length)
+      return;
+   wm_prog_data->urb_setup[VARYING_SLOT_LAYER] = 0;
+   wm_prog_data->num_varying_inputs = 1;
 fs_visitor::run_fs(bool allow_spilling, bool do_rep_send)
@@ -6227,6 +6252,10 @@ fs_visitor::run_fs(bool allow_spilling, bool do_rep_send)
+      if (devinfo->gen >= 9)
+         gen9_ps_header_only_workaround(wm_prog_data);

