[Mesa-dev] [PATCH v2] anv/skylake: disable ForceThreadDispatchEnable

Sergii Romantsov sergii.romantsov at gmail.com
Wed Sep 19 16:21:11 UTC 2018


On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang.

-v2: enabling of  ForceThreadDispatchEnable is only for gen8, for
     gen9 and higher reverted enabling of PixelShaderHasUAV.

CC: Jason Ekstrand <jason.ekstrand at intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941
Fixes: 79270d2140ec (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV)
Signed-off-by: Sergii Romantsov <sergii.romantsov at globallogic.com>
---
 src/intel/vulkan/genX_pipeline.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 9595a71..b469270 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -1445,7 +1445,7 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass,
             wm.EarlyDepthStencilControl         = EDSC_NORMAL;
          }
 
-#if GEN_GEN >= 8
+#if GEN_GEN == 8
          /* Gen8 hardware tries to compute ThreadDispatchEnable for us but
           * doesn't take into account KillPixels when no depth or stencil
           * writes are enabled.  In order for occlusion queries to work
@@ -1663,6 +1663,37 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline,
                                          wm_prog_data->uses_kill;
 
 #if GEN_GEN >= 9
+      /* The stricter cross-primitive coherency guarantees that the hardware
+       * gives us with the "Accesses UAV" bit set for at least one shader stage
+       * and the "UAV coherency required" bit set on the 3DPRIMITIVE command are
+       * redundant within the current image, atomic counter and SSBO GL APIs,
+       * which all have very loose ordering and coherency requirements and
+       * generally rely on the application to insert explicit barriers when a
+       * shader invocation is expected to see the memory writes performed by the
+       * invocations of some previous primitive.  Regardless of the value of
+       * "UAV coherency required", the "Accesses UAV" bits will implicitly cause
+       * an in most cases useless DC flush when the lowermost stage with the bit
+       * set finishes execution.
+       *
+       * It would be nice to disable it, but in some cases we can't because on
+       * Gen8+ it also has an influence on rasterization via the PS UAV-only
+       * signal (which could be set independently from the coherency mechanism
+       * in the 3DSTATE_WM command on Gen7), and because in some cases it will
+       * determine whether the hardware skips execution of the fragment shader
+       * or not via the ThreadDispatchEnable signal.  However if we know that
+       * GEN8_PS_BLEND_HAS_WRITEABLE_RT is going to be set and
+       * GEN8_PSX_PIXEL_SHADER_NO_RT_WRITE is not set it shouldn't make any
+       * difference so we may just disable it here.
+       *
+       * Gen8 hardware tries to compute ThreadDispatchEnable for us but doesn't
+       * take into account KillPixels when no depth or stencil writes are
+       * enabled. In order for occlusion queries to work correctly with no
+       * attachments, we need to force-enable here.
+       */
+      if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
+          !has_color_buffer_write_enabled(pipeline, blend))
+         ps.PixelShaderHasUAV = true;
+
       ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil;
       ps.PixelShaderPullsBary    = wm_prog_data->pulls_bary;
 
-- 
2.7.4



More information about the mesa-dev mailing list