Mesa (master): anv/skylake: disable ForceThreadDispatchEnable

GitLab Mirror gitlab-mirror at kemper.freedesktop.org
Tue Oct 16 18:21:28 UTC 2018


Module: Mesa
Branch: master
Commit: 0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5
URL:    http://cgit.freedesktop.org/mesa/mesa/commit/?id=0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5

Author: Sergii Romantsov <sergii.romantsov at gmail.com>
Date:   Wed Sep 19 19:21:11 2018 +0300

anv/skylake: disable ForceThreadDispatchEnable

On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang.

-v2: enabling of  ForceThreadDispatchEnable is only for gen8, for
     gen9 and higher reverted enabling of PixelShaderHasUAV.

-v3 (Jason Ekstrand): Rework the comments a bit.

CC: Jason Ekstrand <jason.ekstrand at intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760
Fixes: 79270d2140ec (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV)
Signed-off-by: Sergii Romantsov <sergii.romantsov at globallogic.com>
Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>

---

 src/intel/vulkan/genX_pipeline.c | 42 +++++++++++++++++++++++++++++++++-------
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 9595a7133a..33f1f7832a 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -1445,12 +1445,12 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass,
             wm.EarlyDepthStencilControl         = EDSC_NORMAL;
          }
 
-#if GEN_GEN >= 8
-         /* Gen8 hardware tries to compute ThreadDispatchEnable for us but
-          * doesn't take into account KillPixels when no depth or stencil
-          * writes are enabled.  In order for occlusion queries to work
-          * correctly with no attachments, we need to force-enable PS thread
-          * dispatch.
+#if GEN_GEN == 8
+         /* Gen8 and later hardware tries to compute ThreadDispatchEnable for
+          * us but doesn't take into account KillPixels when no depth or
+          * stencil writes are enabled.  In order for occlusion queries to
+          * work correctly with no attachments, we need to force-enable PS
+          * thread dispatch.
           *
           * The BDW docs are pretty clear that that this bit isn't validated
           * and probably shouldn't be used in production:
@@ -1460,7 +1460,9 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass,
           *
           * Unfortunately, however, the other mechanism we have for doing this
           * is 3DSTATE_PS_EXTRA::PixelShaderHasUAV which causes hangs on BDW.
-          * Given two bad options, we choose the one which works.
+          * Given two bad options, we choose the one which works.  On Skylake
+          * and later, setting ForceThreadDispatchEnable causes GPU hangs so
+          * we use the PixelShaderHasUAV mechanism there.
           */
          if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
              !has_color_buffer_write_enabled(pipeline, blend))
@@ -1663,6 +1665,32 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline,
                                          wm_prog_data->uses_kill;
 
 #if GEN_GEN >= 9
+      /* Gen8 and later hardware tries to compute ThreadDispatchEnable for us
+       * but doesn't take into account KillPixels when no depth or stencil
+       * writes are enabled.  In order for occlusion queries to work correctly
+       * with no attachments, we need to force-enable PS thread dispatch.
+       *
+       * The stricter cross-primitive coherency guarantees that the hardware
+       * gives us with the "Accesses UAV" bit set for at least one shader stage
+       * and the "UAV coherency required" bit set on the 3DPRIMITIVE command are
+       * redundant within the current image, atomic counter and SSBO GL and
+       * Vulkan APIs, which all have very loose ordering and coherency
+       * requirements and generally rely on the application to insert explicit
+       * barriers when a shader invocation is expected to see the memory
+       * writes performed by the invocations of some previous primitive.
+       * Regardless of the value of "UAV coherency required", the "Accesses
+       * UAV" bits will implicitly cause an in most cases useless DC flush
+       * when the lowermost stage with the bit set finishes execution.
+       *
+       * Unfortunately, however, the other mechanism we have for doing this is
+       * 3DSTATE_WM::ForceThreadDispatchEnable which causes GPU hangs on
+       * Skylake and later hardware.  On Broadwell, however, setting this bit
+       * causes GPU hangs so we use ForceThreadDispatchEnable there.
+       */
+      if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
+          !has_color_buffer_write_enabled(pipeline, blend))
+         ps.PixelShaderHasUAV = true;
+
       ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil;
       ps.PixelShaderPullsBary    = wm_prog_data->pulls_bary;
 




More information about the mesa-commit mailing list