[Mesa-dev] [PATCH 2/2] Revert "anv/skylake: disable ForceThreadDispatchEnable"

Juan A. Suarez Romero jasuarez at igalia.com
Mon Oct 29 11:44:50 UTC 2018


On Tue, 2018-10-16 at 15:12 -0500, Jason Ekstrand wrote:
> This reverts commit 0fa9e6d7b304f6a8064ed78a4b9c557e1026e7e5.  The real
> issue appears to have been that HiZ ops don't like having WM thread
> dispatch force-enabled.  The previous commit fixes that problem so we
> can go back to using the ForceThreadDispatchEnable bit even on SKL+.
> ---
>  src/intel/vulkan/genX_pipeline.c | 42 ++++++--------------------------
>  1 file changed, 7 insertions(+), 35 deletions(-)
> 


While this patch has landed in master with CC to @stable, I'm adding it to the
rejected list for 18.2, as the commit it reverts, 0fa9e6d7b30 ("anv/skylake:
disable ForceThreadDispatchEnable"), never landed in 18.2 branch.


The reason why 0fa9e6d7b30 didn't arrive to 18.2 was due an error in the fixes
tag. This commit fixes abd629eb3d4 "(anv: Stop setting
3DSTATE_PS_EXTRA::PixelShaderHasUAV"), but instead we set in the Fixes: tag
79270d2140ec "(anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV"), which is
actually a cherry-pick of the former. Hence that is why our scripts didn't catch
it.

In any case as 0fa9e6d7b30 ("anv/skylake: disable ForceThreadDispatchEnable") is
not in 18.2, no need to revert it, so hence adding this patch to cherry-ignore.

	J.A.


> diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
> index 33f1f7832ac..9595a7133ae 100644
> --- a/src/intel/vulkan/genX_pipeline.c
> +++ b/src/intel/vulkan/genX_pipeline.c
> @@ -1445,12 +1445,12 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass,
>              wm.EarlyDepthStencilControl         = EDSC_NORMAL;
>           }
>  
> -#if GEN_GEN == 8
> -         /* Gen8 and later hardware tries to compute ThreadDispatchEnable for
> -          * us but doesn't take into account KillPixels when no depth or
> -          * stencil writes are enabled.  In order for occlusion queries to
> -          * work correctly with no attachments, we need to force-enable PS
> -          * thread dispatch.
> +#if GEN_GEN >= 8
> +         /* Gen8 hardware tries to compute ThreadDispatchEnable for us but
> +          * doesn't take into account KillPixels when no depth or stencil
> +          * writes are enabled.  In order for occlusion queries to work
> +          * correctly with no attachments, we need to force-enable PS thread
> +          * dispatch.
>            *
>            * The BDW docs are pretty clear that that this bit isn't validated
>            * and probably shouldn't be used in production:
> @@ -1460,9 +1460,7 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass,
>            *
>            * Unfortunately, however, the other mechanism we have for doing this
>            * is 3DSTATE_PS_EXTRA::PixelShaderHasUAV which causes hangs on BDW.
> -          * Given two bad options, we choose the one which works.  On Skylake
> -          * and later, setting ForceThreadDispatchEnable causes GPU hangs so
> -          * we use the PixelShaderHasUAV mechanism there.
> +          * Given two bad options, we choose the one which works.
>            */
>           if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
>               !has_color_buffer_write_enabled(pipeline, blend))
> @@ -1665,32 +1663,6 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline,
>                                           wm_prog_data->uses_kill;
>  
>  #if GEN_GEN >= 9
> -      /* Gen8 and later hardware tries to compute ThreadDispatchEnable for us
> -       * but doesn't take into account KillPixels when no depth or stencil
> -       * writes are enabled.  In order for occlusion queries to work correctly
> -       * with no attachments, we need to force-enable PS thread dispatch.
> -       *
> -       * The stricter cross-primitive coherency guarantees that the hardware
> -       * gives us with the "Accesses UAV" bit set for at least one shader stage
> -       * and the "UAV coherency required" bit set on the 3DPRIMITIVE command are
> -       * redundant within the current image, atomic counter and SSBO GL and
> -       * Vulkan APIs, which all have very loose ordering and coherency
> -       * requirements and generally rely on the application to insert explicit
> -       * barriers when a shader invocation is expected to see the memory
> -       * writes performed by the invocations of some previous primitive.
> -       * Regardless of the value of "UAV coherency required", the "Accesses
> -       * UAV" bits will implicitly cause an in most cases useless DC flush
> -       * when the lowermost stage with the bit set finishes execution.
> -       *
> -       * Unfortunately, however, the other mechanism we have for doing this is
> -       * 3DSTATE_WM::ForceThreadDispatchEnable which causes GPU hangs on
> -       * Skylake and later hardware.  On Broadwell, however, setting this bit
> -       * causes GPU hangs so we use ForceThreadDispatchEnable there.
> -       */
> -      if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
> -          !has_color_buffer_write_enabled(pipeline, blend))
> -         ps.PixelShaderHasUAV = true;
> -
>        ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil;
>        ps.PixelShaderPullsBary    = wm_prog_data->pulls_bary;
>  



More information about the mesa-dev mailing list