[Mesa-dev] [PATCH 1/2] i965/ps: Use ForceThreadDispatchEnable instead of AccessUAV.
Jason Ekstrand
jason at jlekstrand.net
Tue Feb 14 02:58:39 UTC 2017
On Mon, Feb 13, 2017 at 6:35 PM, Francisco Jerez <currojerez at riseup.net>
wrote:
> Jason Ekstrand <jason at jlekstrand.net> writes:
>
> > The AccessUAV bit is not quite what we want because it's more about
> > coherency between storage operations than it is about dispatch. Also,
> > the 3DSTATE_PS_EXTRA::PixelShaderHasUAV bit seems to cause hangs on
> > Broadwell for unknown reasons so it's best to just leave it off. The
> > 3DSTATE_WM::ForceThreadDispatchEnable bit, on the other hand, does
> > exactly what we want and seems to work fine.
> > ---
> > src/mesa/drivers/dri/i965/brw_defines.h | 2 ++
> > src/mesa/drivers/dri/i965/gen8_ps_state.c | 52
> ++++++++++---------------------
> > 2 files changed, 19 insertions(+), 35 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h
> b/src/mesa/drivers/dri/i965/brw_defines.h
> > index 3c5c6c4..9ad36ca 100644
> > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> > @@ -2733,6 +2733,8 @@ enum brw_barycentric_mode {
> > # define GEN7_WM_MSDISPMODE_PERSAMPLE (0 << 31)
> > # define GEN7_WM_MSDISPMODE_PERPIXEL (1 << 31)
> > # define HSW_WM_UAV_ONLY (1 << 30)
> > +# define GEN8_WM_FORCE_THREAD_DISPATCH_OFF (1 << 19)
> > +# define GEN8_WM_FORCE_THREAD_DISPATCH_ON (2 << 19)
> >
> > #define _3DSTATE_PS 0x7820 /* GEN7+ */
> > /* DW1: kernel pointer */
> > diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c
> b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> > index 0346826..cca57e6 100644
> > --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> > @@ -74,39 +74,6 @@ gen8_upload_ps_extra(struct brw_context *brw,
> > if (brw->gen >= 9 && prog_data->pulls_bary)
> > dw1 |= GEN9_PSX_SHADER_PULLS_BARY;
> >
> > - /* The stricter cross-primitive coherency guarantees that the
> hardware
> > - * gives us with the "Accesses UAV" bit set for at least one shader
> stage
> > - * and the "UAV coherency required" bit set on the 3DPRIMITIVE
> command are
> > - * redundant within the current image, atomic counter and SSBO GL
> APIs,
> > - * which all have very loose ordering and coherency requirements and
> > - * generally rely on the application to insert explicit barriers
> when a
> > - * shader invocation is expected to see the memory writes performed
> by the
> > - * invocations of some previous primitive. Regardless of the value
> of "UAV
> > - * coherency required", the "Accesses UAV" bits will implicitly
> cause an in
> > - * most cases useless DC flush when the lowermost stage with the bit
> set
> > - * finishes execution.
> > - *
> > - * It would be nice to disable it, but in some cases we can't
> because on
> > - * Gen8+ it also has an influence on rasterization via the PS
> UAV-only
> > - * signal (which could be set independently from the coherency
> mechanism in
> > - * the 3DSTATE_WM command on Gen7), and because in some cases it will
> > - * determine whether the hardware skips execution of the fragment
> shader or
> > - * not via the ThreadDispatchEnable signal. However if we know that
> > - * GEN8_PS_BLEND_HAS_WRITEABLE_RT is going to be set and
> > - * GEN8_PSX_PIXEL_SHADER_NO_RT_WRITE is not set it shouldn't make
> any
> > - * difference so we may just disable it here.
> > - *
> > - * Gen8 hardware tries to compute ThreadDispatchEnable for us but
> doesn't
> > - * take into account KillPixels when no depth or stencil writes are
> enabled.
> > - * In order for occlusion queries to work correctly with no
> attachments, we
> > - * need to force-enable here.
> > - *
> > - * BRW_NEW_FS_PROG_DATA | BRW_NEW_FRAGMENT_PROGRAM | _NEW_BUFFERS |
> _NEW_COLOR
> > - */
> > - if ((prog_data->has_side_effects || prog_data->uses_kill) &&
> > - !brw_color_buffer_write_enabled(brw))
> > - dw1 |= GEN8_PSX_SHADER_HAS_UAV;
> > -
> > if (prog_data->computed_stencil) {
> > assert(brw->gen >= 9);
> > dw1 |= GEN9_PSX_SHADER_COMPUTES_STENCIL;
> > @@ -127,7 +94,6 @@ upload_ps_extra(struct brw_context *brw)
> >
> > const struct brw_tracked_state gen8_ps_extra = {
> > .dirty = {
> > - .mesa = _NEW_BUFFERS | _NEW_COLOR,
> > .brw = BRW_NEW_BLORP |
> > BRW_NEW_CONTEXT |
> > BRW_NEW_FRAGMENT_PROGRAM |
> > @@ -169,6 +135,20 @@ upload_wm_state(struct brw_context *brw)
> > else if (wm_prog_data->has_side_effects)
> > dw1 |= GEN7_WM_EARLY_DS_CONTROL_PSEXEC;
> >
> > + /* In most cases, the computation for the
> WM_INT::ThreadDispatchEnable
> > + * does exactly what you want. However, when you don't have any
> color
> > + * buffers or when depth/stencil writes are disabled, it misses a few
> > + * cases. In those cases, we force it to dispatch the PS by using
> the
> > + * 3DSTATE_WM::ForceThreadDispatchEnable bit. According to the
> docs,
> > + * this bit is not validated and shouldn't be used. However, it
> seems
> > + * to work fine and this is exactly the type of thing you would use
> such
> > + *
> > + * BRW_NEW_FS_PROG_DATA | _NEW_BUFFERS | _NEW_COLOR
> > + */
> > + if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
> > + !brw_color_buffer_write_enabled(brw))
> > + dw1 |= GEN8_WM_FORCE_THREAD_DISPATCH_ON;
> > +
>
> According to the docs the ForceThreadDispatchEnable bits will override
> the WM ThreadDispatchEnable signal to cause fragment shader dispatch
> even during the execution of a hi-Z op. Doesn't this mean that you now
> need to re-emit a 3DSTATE_WM packet prior to the execution of a hi-Z op
> to make sure the ForceThreadDispatchEnable field is clear?
That's an interesting question. I had sort-of assumed that if you used the
WM_HZ_OP packet, it would implicitly re-emit 3DSTATE_WM and that the
comment on 3DSTATE_WM::ForceThreadDispatchEnable only applied to hiz ops.
Now that I read things a bit more carefully, it appears that the formula
explicitly includes both 3DSTATE_WM and WM_HZ_OP so there probably is an
interaction there.
> Also, the
> PMA fix seems to be sensitive on the state of ForceThreadDispatchEnable,
> but I don't think this will affect the actual calculation because it
> only cares about thread dispatch being forced off. You may want to
> update the comment in gen8_depth_state though.
>
I think the PMA fix is fine but a comment may be in order.
> > BEGIN_BATCH(2);
> > OUT_BATCH(_3DSTATE_WM << 16 | (2 - 2));
> > OUT_BATCH(dw1);
> > @@ -177,7 +157,9 @@ upload_wm_state(struct brw_context *brw)
> >
> > const struct brw_tracked_state gen8_wm_state = {
> > .dirty = {
> > - .mesa = _NEW_LINE |
> > + .mesa = _NEW_BUFFERS |
> > + _NEW_COLOR |
> > + _NEW_LINE |
> > _NEW_POLYGON,
> > .brw = BRW_NEW_BLORP |
> > BRW_NEW_CONTEXT |
> > --
> > 2.5.0.400.gff86faf
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170213/21b13d1c/attachment-0001.html>
More information about the mesa-dev
mailing list