[Mesa-dev] [PATCH 2/2] Revert "anv/skylake: disable ForceThreadDispatchEnable"
Juan A. Suarez Romero
jasuarez at igalia.com
Mon Oct 29 11:44:50 UTC 2018
On Tue, 2018-10-16 at 15:12 -0500, Jason Ekstrand wrote:
> This reverts commit 0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5. The real
> issue appears to have been that HiZ ops don't like having WM thread
> dispatch force-enabled. The previous commit fixes that problem so we
> can go back to using the ForceThreadDispatchEnable bit even on SKL+.
> ---
> src/intel/vulkan/genX_pipeline.c | 42 ++++++--------------------------
> 1 file changed, 7 insertions(+), 35 deletions(-)
>
While this patch has landed in master with CC to @stable, I'm adding it to the
rejected list for 18.2, as the commit it reverts, 0fa9e6d7b30 ("anv/skylake:
disable ForceThreadDispatchEnable"), never landed in 18.2 branch.
The reason why 0fa9e6d7b30 didn't arrive to 18.2 was due an error in the fixes
tag. This commit fixes abd629eb3d4 "(anv: Stop setting
3DSTATE_PS_EXTRA::PixelShaderHasUAV"), but instead we set in the Fixes: tag
79270d2140ec "(anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV"), which is
actually a cherry-pick of the former. Hence that is why our scripts didn't catch
it.
In any case as 0fa9e6d7b30 ("anv/skylake: disable ForceThreadDispatchEnable") is
not in 18.2, no need to revert it, so hence adding this patch to cherry-ignore.
J.A.
> diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
> index 33f1f7832ac..9595a7133ae 100644
> --- a/src/intel/vulkan/genX_pipeline.c
> +++ b/src/intel/vulkan/genX_pipeline.c
> @@ -1445,12 +1445,12 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass,
> wm.EarlyDepthStencilControl = EDSC_NORMAL;
> }
>
> -#if GEN_GEN == 8
> - /* Gen8 and later hardware tries to compute ThreadDispatchEnable for
> - * us but doesn't take into account KillPixels when no depth or
> - * stencil writes are enabled. In order for occlusion queries to
> - * work correctly with no attachments, we need to force-enable PS
> - * thread dispatch.
> +#if GEN_GEN >= 8
> + /* Gen8 hardware tries to compute ThreadDispatchEnable for us but
> + * doesn't take into account KillPixels when no depth or stencil
> + * writes are enabled. In order for occlusion queries to work
> + * correctly with no attachments, we need to force-enable PS thread
> + * dispatch.
> *
> * The BDW docs are pretty clear that that this bit isn't validated
> * and probably shouldn't be used in production:
> @@ -1460,9 +1460,7 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass,
> *
> * Unfortunately, however, the other mechanism we have for doing this
> * is 3DSTATE_PS_EXTRA::PixelShaderHasUAV which causes hangs on BDW.
> - * Given two bad options, we choose the one which works. On Skylake
> - * and later, setting ForceThreadDispatchEnable causes GPU hangs so
> - * we use the PixelShaderHasUAV mechanism there.
> + * Given two bad options, we choose the one which works.
> */
> if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
> !has_color_buffer_write_enabled(pipeline, blend))
> @@ -1665,32 +1663,6 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline,
> wm_prog_data->uses_kill;
>
> #if GEN_GEN >= 9
> - /* Gen8 and later hardware tries to compute ThreadDispatchEnable for us
> - * but doesn't take into account KillPixels when no depth or stencil
> - * writes are enabled. In order for occlusion queries to work correctly
> - * with no attachments, we need to force-enable PS thread dispatch.
> - *
> - * The stricter cross-primitive coherency guarantees that the hardware
> - * gives us with the "Accesses UAV" bit set for at least one shader stage
> - * and the "UAV coherency required" bit set on the 3DPRIMITIVE command are
> - * redundant within the current image, atomic counter and SSBO GL and
> - * Vulkan APIs, which all have very loose ordering and coherency
> - * requirements and generally rely on the application to insert explicit
> - * barriers when a shader invocation is expected to see the memory
> - * writes performed by the invocations of some previous primitive.
> - * Regardless of the value of "UAV coherency required", the "Accesses
> - * UAV" bits will implicitly cause an in most cases useless DC flush
> - * when the lowermost stage with the bit set finishes execution.
> - *
> - * Unfortunately, however, the other mechanism we have for doing this is
> - * 3DSTATE_WM::ForceThreadDispatchEnable which causes GPU hangs on
> - * Skylake and later hardware. On Broadwell, however, setting this bit
> - * causes GPU hangs so we use ForceThreadDispatchEnable there.
> - */
> - if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
> - !has_color_buffer_write_enabled(pipeline, blend))
> - ps.PixelShaderHasUAV = true;
> -
> ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil;
> ps.PixelShaderPullsBary = wm_prog_data->pulls_bary;
>
More information about the mesa-dev
mailing list