[Mesa-dev] [PATCH] anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV
jason at jlekstrand.net
Thu Jul 12 05:50:08 UTC 2018
We've had several broadwell hangs that have come down to this bit just
not working correctly. Most recently, we've had a pile of hangs
reported with apps running under DXVK:
Instead, use the bit that doesn't try to imply weird D3D coherency
things and just force-enables the PS like we want.
cc: mesa-stable at lists.freedesktop.org
src/intel/vulkan/genX_pipeline.c | 53 +++++++++++++-------------------
1 file changed, 22 insertions(+), 31 deletions(-)
diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index aa63ad0e097..4004a36f74f 100644
@@ -1389,6 +1389,28 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass,
wm.EarlyDepthStencilControl = EDSC_NORMAL;
+#if GEN_GEN >= 8
+ /* Gen8 hardware tries to compute ThreadDispatchEnable for us but
+ * doesn't take into account KillPixels when no depth or stencil
+ * writes are enabled. In order for occlusion queries to work
+ * correctly with no attachments, we need to force-enable PS thread
+ * dispatch.
+ * The BDW docs are pretty clear that that this bit isn't validated
+ * and probably shouldn't be used in production:
+ * "This must always be set to Normal. This field should not be
+ * tested for functional validation."
+ * Unfortunately, however, the other mechanism we have for doing this
+ * is 3DSTATE_PS_EXTRA::PixelShaderHasUAV which causes hangs on BDW.
+ * Given two bad options, we choose the one which works.
+ if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
+ !has_color_buffer_write_enabled(pipeline, blend))
+ wm.ForceThreadDispatchEnable = ForceON;
@@ -1583,37 +1605,6 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline,
ps.PixelShaderKillsPixel = subpass->has_ds_self_dep ||
- /* The stricter cross-primitive coherency guarantees that the hardware
- * gives us with the "Accesses UAV" bit set for at least one shader stage
- * and the "UAV coherency required" bit set on the 3DPRIMITIVE command are
- * redundant within the current image, atomic counter and SSBO GL APIs,
- * which all have very loose ordering and coherency requirements and
- * generally rely on the application to insert explicit barriers when a
- * shader invocation is expected to see the memory writes performed by the
- * invocations of some previous primitive. Regardless of the value of
- * "UAV coherency required", the "Accesses UAV" bits will implicitly cause
- * an in most cases useless DC flush when the lowermost stage with the bit
- * set finishes execution.
- * It would be nice to disable it, but in some cases we can't because on
- * Gen8+ it also has an influence on rasterization via the PS UAV-only
- * signal (which could be set independently from the coherency mechanism
- * in the 3DSTATE_WM command on Gen7), and because in some cases it will
- * determine whether the hardware skips execution of the fragment shader
- * or not via the ThreadDispatchEnable signal. However if we know that
- * GEN8_PS_BLEND_HAS_WRITEABLE_RT is going to be set and
- * GEN8_PSX_PIXEL_SHADER_NO_RT_WRITE is not set it shouldn't make any
- * difference so we may just disable it here.
- * Gen8 hardware tries to compute ThreadDispatchEnable for us but doesn't
- * take into account KillPixels when no depth or stencil writes are
- * enabled. In order for occlusion queries to work correctly with no
- * attachments, we need to force-enable here.
- if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
- !has_color_buffer_write_enabled(pipeline, blend))
- ps.PixelShaderHasUAV = true;
#if GEN_GEN >= 9
ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil;
ps.PixelShaderPullsBary = wm_prog_data->pulls_bary;
More information about the mesa-dev