<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Feb 13, 2017 at 6:35 PM, Francisco Jerez <span dir="ltr"><<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> writes:<br>
<br>
> The AccessUAV bit is not quite what we want because it's more about<br>
> coherency between storage operations than it is about dispatch. Also,<br>
> the 3DSTATE_PS_EXTRA::<wbr>PixelShaderHasUAV bit seems to cause hangs on<br>
> Broadwell for unknown reasons so it's best to just leave it off. The<br>
> 3DSTATE_WM::<wbr>ForceThreadDispatchEnable bit, on the other hand, does<br>
> exactly what we want and seems to work fine.<br>
> ---<br>
> src/mesa/drivers/dri/i965/brw_<wbr>defines.h | 2 ++<br>
> src/mesa/drivers/dri/i965/<wbr>gen8_ps_state.c | 52 ++++++++++--------------------<wbr>-<br>
> 2 files changed, 19 insertions(+), 35 deletions(-)<br>
><br>
> diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_defines.h b/src/mesa/drivers/dri/i965/<wbr>brw_defines.h<br>
> index 3c5c6c4..9ad36ca 100644<br>
> --- a/src/mesa/drivers/dri/i965/<wbr>brw_defines.h<br>
> +++ b/src/mesa/drivers/dri/i965/<wbr>brw_defines.h<br>
> @@ -2733,6 +2733,8 @@ enum brw_barycentric_mode {<br>
> # define GEN7_WM_MSDISPMODE_PERSAMPLE (0 << 31)<br>
> # define GEN7_WM_MSDISPMODE_PERPIXEL (1 << 31)<br>
> # define HSW_WM_UAV_ONLY (1 << 30)<br>
> +# define GEN8_WM_FORCE_THREAD_DISPATCH_<wbr>OFF (1 << 19)<br>
> +# define GEN8_WM_FORCE_THREAD_DISPATCH_<wbr>ON (2 << 19)<br>
><br>
> #define _3DSTATE_PS 0x7820 /* GEN7+ */<br>
> /* DW1: kernel pointer */<br>
> diff --git a/src/mesa/drivers/dri/i965/<wbr>gen8_ps_state.c b/src/mesa/drivers/dri/i965/<wbr>gen8_ps_state.c<br>
> index 0346826..cca57e6 100644<br>
> --- a/src/mesa/drivers/dri/i965/<wbr>gen8_ps_state.c<br>
> +++ b/src/mesa/drivers/dri/i965/<wbr>gen8_ps_state.c<br>
> @@ -74,39 +74,6 @@ gen8_upload_ps_extra(struct brw_context *brw,<br>
> if (brw->gen >= 9 && prog_data->pulls_bary)<br>
> dw1 |= GEN9_PSX_SHADER_PULLS_BARY;<br>
><br>
> - /* The stricter cross-primitive coherency guarantees that the hardware<br>
> - * gives us with the "Accesses UAV" bit set for at least one shader stage<br>
> - * and the "UAV coherency required" bit set on the 3DPRIMITIVE command are<br>
> - * redundant within the current image, atomic counter and SSBO GL APIs,<br>
> - * which all have very loose ordering and coherency requirements and<br>
> - * generally rely on the application to insert explicit barriers when a<br>
> - * shader invocation is expected to see the memory writes performed by the<br>
> - * invocations of some previous primitive. Regardless of the value of "UAV<br>
> - * coherency required", the "Accesses UAV" bits will implicitly cause an in<br>
> - * most cases useless DC flush when the lowermost stage with the bit set<br>
> - * finishes execution.<br>
> - *<br>
> - * It would be nice to disable it, but in some cases we can't because on<br>
> - * Gen8+ it also has an influence on rasterization via the PS UAV-only<br>
> - * signal (which could be set independently from the coherency mechanism in<br>
> - * the 3DSTATE_WM command on Gen7), and because in some cases it will<br>
> - * determine whether the hardware skips execution of the fragment shader or<br>
> - * not via the ThreadDispatchEnable signal. However if we know that<br>
> - * GEN8_PS_BLEND_HAS_WRITEABLE_RT is going to be set and<br>
> - * GEN8_PSX_PIXEL_SHADER_NO_RT_<wbr>WRITE is not set it shouldn't make any<br>
> - * difference so we may just disable it here.<br>
> - *<br>
> - * Gen8 hardware tries to compute ThreadDispatchEnable for us but doesn't<br>
> - * take into account KillPixels when no depth or stencil writes are enabled.<br>
> - * In order for occlusion queries to work correctly with no attachments, we<br>
> - * need to force-enable here.<br>
> - *<br>
> - * BRW_NEW_FS_PROG_DATA | BRW_NEW_FRAGMENT_PROGRAM | _NEW_BUFFERS | _NEW_COLOR<br>
> - */<br>
> - if ((prog_data->has_side_effects || prog_data->uses_kill) &&<br>
> - !brw_color_buffer_write_<wbr>enabled(brw))<br>
> - dw1 |= GEN8_PSX_SHADER_HAS_UAV;<br>
> -<br>
> if (prog_data->computed_stencil) {<br>
> assert(brw->gen >= 9);<br>
> dw1 |= GEN9_PSX_SHADER_COMPUTES_<wbr>STENCIL;<br>
> @@ -127,7 +94,6 @@ upload_ps_extra(struct brw_context *brw)<br>
><br>
> const struct brw_tracked_state gen8_ps_extra = {<br>
> .dirty = {<br>
> - .mesa = _NEW_BUFFERS | _NEW_COLOR,<br>
> .brw = BRW_NEW_BLORP |<br>
> BRW_NEW_CONTEXT |<br>
> BRW_NEW_FRAGMENT_PROGRAM |<br>
> @@ -169,6 +135,20 @@ upload_wm_state(struct brw_context *brw)<br>
> else if (wm_prog_data->has_side_<wbr>effects)<br>
> dw1 |= GEN7_WM_EARLY_DS_CONTROL_<wbr>PSEXEC;<br>
><br>
> + /* In most cases, the computation for the WM_INT::ThreadDispatchEnable<br>
> + * does exactly what you want. However, when you don't have any color<br>
> + * buffers or when depth/stencil writes are disabled, it misses a few<br>
> + * cases. In those cases, we force it to dispatch the PS by using the<br>
> + * 3DSTATE_WM::<wbr>ForceThreadDispatchEnable bit. According to the docs,<br>
> + * this bit is not validated and shouldn't be used. However, it seems<br>
> + * to work fine and this is exactly the type of thing you would use such<br>
> + *<br>
> + * BRW_NEW_FS_PROG_DATA | _NEW_BUFFERS | _NEW_COLOR<br>
> + */<br>
> + if ((wm_prog_data->has_side_<wbr>effects || wm_prog_data->uses_kill) &&<br>
> + !brw_color_buffer_write_<wbr>enabled(brw))<br>
> + dw1 |= GEN8_WM_FORCE_THREAD_DISPATCH_<wbr>ON;<br>
> +<br>
<br>
</div></div>According to the docs the ForceThreadDispatchEnable bits will override<br>
the WM ThreadDispatchEnable signal to cause fragment shader dispatch<br>
even during the execution of a hi-Z op. Doesn't this mean that you now<br>
need to re-emit a 3DSTATE_WM packet prior to the execution of a hi-Z op<br>
to make sure the ForceThreadDispatchEnable field is clear?</blockquote><div><br></div><div>That's an interesting question. I had sort-of assumed that if you used the WM_HZ_OP packet, it would implicitly re-emit 3DSTATE_WM and that the comment on 3DSTATE_WM::ForceThreadDispatchEnable only applied to hiz ops. Now that I read things a bit more carefully, it appears that the formula explicitly includes both 3DSTATE_WM and WM_HZ_OP so there probably is an interaction there.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Also, the<br>
PMA fix seems to be sensitive on the state of ForceThreadDispatchEnable,<br>
but I don't think this will affect the actual calculation because it<br>
only cares about thread dispatch being forced off. You may want to<br>
update the comment in gen8_depth_state though.<span class=""><br></span></blockquote><div><br></div><div>I think the PMA fix is fine but a comment may be in order.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
> BEGIN_BATCH(2);<br>
> OUT_BATCH(_3DSTATE_WM << 16 | (2 - 2));<br>
> OUT_BATCH(dw1);<br>
> @@ -177,7 +157,9 @@ upload_wm_state(struct brw_context *brw)<br>
><br>
> const struct brw_tracked_state gen8_wm_state = {<br>
> .dirty = {<br>
> - .mesa = _NEW_LINE |<br>
> + .mesa = _NEW_BUFFERS |<br>
> + _NEW_COLOR |<br>
> + _NEW_LINE |<br>
> _NEW_POLYGON,<br>
> .brw = BRW_NEW_BLORP |<br>
> BRW_NEW_CONTEXT |<br>
> --<br>
> 2.5.0.400.gff86faf<br>
><br>
</span>> ______________________________<wbr>_________________<br>
> mesa-dev mailing list<br>
> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</blockquote></div><br></div></div>