Mesa (master): anv/skylake: disable ForceThreadDispatchEnable
GitLab Mirror
gitlab-mirror at kemper.freedesktop.org
Tue Oct 16 18:21:28 UTC 2018
Module: Mesa
Branch: master
Commit: 0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5
URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5
Author: Sergii Romantsov <sergii.romantsov at gmail.com>
Date: Wed Sep 19 19:21:11 2018 +0300
anv/skylake: disable ForceThreadDispatchEnable
On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang.
-v2: enabling of ForceThreadDispatchEnable is only for gen8, for
gen9 and higher reverted enabling of PixelShaderHasUAV.
-v3 (Jason Ekstrand): Rework the comments a bit.
CC: Jason Ekstrand <jason.ekstrand at intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760
Fixes: 79270d2140ec (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV)
Signed-off-by: Sergii Romantsov <sergii.romantsov at globallogic.com>
Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
---
src/intel/vulkan/genX_pipeline.c | 42 +++++++++++++++++++++++++++++++++-------
1 file changed, 35 insertions(+), 7 deletions(-)
diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 9595a7133a..33f1f7832a 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -1445,12 +1445,12 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass,
wm.EarlyDepthStencilControl = EDSC_NORMAL;
}
-#if GEN_GEN >= 8
- /* Gen8 hardware tries to compute ThreadDispatchEnable for us but
- * doesn't take into account KillPixels when no depth or stencil
- * writes are enabled. In order for occlusion queries to work
- * correctly with no attachments, we need to force-enable PS thread
- * dispatch.
+#if GEN_GEN == 8
+ /* Gen8 and later hardware tries to compute ThreadDispatchEnable for
+ * us but doesn't take into account KillPixels when no depth or
+ * stencil writes are enabled. In order for occlusion queries to
+ * work correctly with no attachments, we need to force-enable PS
+ * thread dispatch.
*
* The BDW docs are pretty clear that that this bit isn't validated
* and probably shouldn't be used in production:
@@ -1460,7 +1460,9 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass,
*
* Unfortunately, however, the other mechanism we have for doing this
* is 3DSTATE_PS_EXTRA::PixelShaderHasUAV which causes hangs on BDW.
- * Given two bad options, we choose the one which works.
+ * Given two bad options, we choose the one which works. On Skylake
+ * and later, setting ForceThreadDispatchEnable causes GPU hangs so
+ * we use the PixelShaderHasUAV mechanism there.
*/
if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
!has_color_buffer_write_enabled(pipeline, blend))
@@ -1663,6 +1665,32 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline,
wm_prog_data->uses_kill;
#if GEN_GEN >= 9
+ /* Gen8 and later hardware tries to compute ThreadDispatchEnable for us
+ * but doesn't take into account KillPixels when no depth or stencil
+ * writes are enabled. In order for occlusion queries to work correctly
+ * with no attachments, we need to force-enable PS thread dispatch.
+ *
+ * The stricter cross-primitive coherency guarantees that the hardware
+ * gives us with the "Accesses UAV" bit set for at least one shader stage
+ * and the "UAV coherency required" bit set on the 3DPRIMITIVE command are
+ * redundant within the current image, atomic counter and SSBO GL and
+ * Vulkan APIs, which all have very loose ordering and coherency
+ * requirements and generally rely on the application to insert explicit
+ * barriers when a shader invocation is expected to see the memory
+ * writes performed by the invocations of some previous primitive.
+ * Regardless of the value of "UAV coherency required", the "Accesses
+ * UAV" bits will implicitly cause an in most cases useless DC flush
+ * when the lowermost stage with the bit set finishes execution.
+ *
+ * Unfortunately, however, the other mechanism we have for doing this is
+ * 3DSTATE_WM::ForceThreadDispatchEnable which causes GPU hangs on
+ * Skylake and later hardware. On Broadwell, however, setting this bit
+ * causes GPU hangs so we use ForceThreadDispatchEnable there.
+ */
+ if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
+ !has_color_buffer_write_enabled(pipeline, blend))
+ ps.PixelShaderHasUAV = true;
+
ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil;
ps.PixelShaderPullsBary = wm_prog_data->pulls_bary;
More information about the mesa-commit
mailing list