[Mesa-dev] [PATCH 1/2] i965/ps: Use ForceThreadDispatchEnable instead of AccessUAV.

Kenneth Graunke kenneth at whitecape.org
Wed Feb 15 02:11:52 UTC 2017


On Monday, February 13, 2017 6:58:39 PM PST Jason Ekstrand wrote:
> On Mon, Feb 13, 2017 at 6:35 PM, Francisco Jerez <currojerez at riseup.net>
> wrote:
> 
> > Jason Ekstrand <jason at jlekstrand.net> writes:
> >
> > > The AccessUAV bit is not quite what we want because it's more about
> > > coherency between storage operations than it is about dispatch.  Also,
> > > the 3DSTATE_PS_EXTRA::PixelShaderHasUAV bit seems to cause hangs on
> > > Broadwell for unknown reasons so it's best to just leave it off.  The
> > > 3DSTATE_WM::ForceThreadDispatchEnable bit, on the other hand, does
> > > exactly what we want and seems to work fine.
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_defines.h   |  2 ++
> > >  src/mesa/drivers/dri/i965/gen8_ps_state.c | 52
> > ++++++++++---------------------
> > >  2 files changed, 19 insertions(+), 35 deletions(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h
> > b/src/mesa/drivers/dri/i965/brw_defines.h
> > > index 3c5c6c4..9ad36ca 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> > > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> > > @@ -2733,6 +2733,8 @@ enum brw_barycentric_mode {
> > >  # define GEN7_WM_MSDISPMODE_PERSAMPLE                        (0 << 31)
> > >  # define GEN7_WM_MSDISPMODE_PERPIXEL                 (1 << 31)
> > >  # define HSW_WM_UAV_ONLY                                (1 << 30)
> > > +# define GEN8_WM_FORCE_THREAD_DISPATCH_OFF              (1 << 19)
> > > +# define GEN8_WM_FORCE_THREAD_DISPATCH_ON               (2 << 19)
> > >
> > >  #define _3DSTATE_PS                          0x7820 /* GEN7+ */
> > >  /* DW1: kernel pointer */
> > > diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c
> > b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> > > index 0346826..cca57e6 100644
> > > --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c
> > > +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> > > @@ -74,39 +74,6 @@ gen8_upload_ps_extra(struct brw_context *brw,
> > >     if (brw->gen >= 9 && prog_data->pulls_bary)
> > >        dw1 |= GEN9_PSX_SHADER_PULLS_BARY;
> > >
> > > -   /* The stricter cross-primitive coherency guarantees that the
> > hardware
> > > -    * gives us with the "Accesses UAV" bit set for at least one shader
> > stage
> > > -    * and the "UAV coherency required" bit set on the 3DPRIMITIVE
> > command are
> > > -    * redundant within the current image, atomic counter and SSBO GL
> > APIs,
> > > -    * which all have very loose ordering and coherency requirements and
> > > -    * generally rely on the application to insert explicit barriers
> > when a
> > > -    * shader invocation is expected to see the memory writes performed
> > by the
> > > -    * invocations of some previous primitive.  Regardless of the value
> > of "UAV
> > > -    * coherency required", the "Accesses UAV" bits will implicitly
> > cause an in
> > > -    * most cases useless DC flush when the lowermost stage with the bit
> > set
> > > -    * finishes execution.
> > > -    *
> > > -    * It would be nice to disable it, but in some cases we can't
> > because on
> > > -    * Gen8+ it also has an influence on rasterization via the PS
> > UAV-only
> > > -    * signal (which could be set independently from the coherency
> > mechanism in
> > > -    * the 3DSTATE_WM command on Gen7), and because in some cases it will
> > > -    * determine whether the hardware skips execution of the fragment
> > shader or
> > > -    * not via the ThreadDispatchEnable signal.  However if we know that
> > > -    * GEN8_PS_BLEND_HAS_WRITEABLE_RT is going to be set and
> > > -    * GEN8_PSX_PIXEL_SHADER_NO_RT_WRITE is not set it shouldn't make
> > any
> > > -    * difference so we may just disable it here.
> > > -    *
> > > -    * Gen8 hardware tries to compute ThreadDispatchEnable for us but
> > doesn't
> > > -    * take into account KillPixels when no depth or stencil writes are
> > enabled.
> > > -    * In order for occlusion queries to work correctly with no
> > attachments, we
> > > -    * need to force-enable here.
> > > -    *
> > > -    * BRW_NEW_FS_PROG_DATA | BRW_NEW_FRAGMENT_PROGRAM | _NEW_BUFFERS |
> > _NEW_COLOR
> > > -    */
> > > -   if ((prog_data->has_side_effects || prog_data->uses_kill) &&
> > > -       !brw_color_buffer_write_enabled(brw))
> > > -      dw1 |= GEN8_PSX_SHADER_HAS_UAV;
> > > -
> > >     if (prog_data->computed_stencil) {
> > >        assert(brw->gen >= 9);
> > >        dw1 |= GEN9_PSX_SHADER_COMPUTES_STENCIL;
> > > @@ -127,7 +94,6 @@ upload_ps_extra(struct brw_context *brw)
> > >
> > >  const struct brw_tracked_state gen8_ps_extra = {
> > >     .dirty = {
> > > -      .mesa  = _NEW_BUFFERS | _NEW_COLOR,
> > >        .brw   = BRW_NEW_BLORP |
> > >                 BRW_NEW_CONTEXT |
> > >                 BRW_NEW_FRAGMENT_PROGRAM |
> > > @@ -169,6 +135,20 @@ upload_wm_state(struct brw_context *brw)
> > >     else if (wm_prog_data->has_side_effects)
> > >        dw1 |= GEN7_WM_EARLY_DS_CONTROL_PSEXEC;
> > >
> > > +   /* In most cases, the computation for the
> > WM_INT::ThreadDispatchEnable
> > > +    * does exactly what you want.  However, when you don't have any
> > color
> > > +    * buffers or when depth/stencil writes are disabled, it misses a few
> > > +    * cases.  In those cases, we force it to dispatch the PS by using
> > the
> > > +    * 3DSTATE_WM::ForceThreadDispatchEnable bit.  According to the
> > docs,
> > > +    * this bit is not validated and shouldn't be used.  However, it
> > seems
> > > +    * to work fine and this is exactly the type of thing you would use
> > such
> > > +    *
> > > +    * BRW_NEW_FS_PROG_DATA | _NEW_BUFFERS | _NEW_COLOR
> > > +    */
> > > +   if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
> > > +       !brw_color_buffer_write_enabled(brw))
> > > +      dw1 |= GEN8_WM_FORCE_THREAD_DISPATCH_ON;
> > > +
> >
> > According to the docs the ForceThreadDispatchEnable bits will override
> > the WM ThreadDispatchEnable signal to cause fragment shader dispatch
> > even during the execution of a hi-Z op.  Doesn't this mean that you now
> > need to re-emit a 3DSTATE_WM packet prior to the execution of a hi-Z op
> > to make sure the ForceThreadDispatchEnable field is clear?
> 
> 
> That's an interesting question.  I had sort-of assumed that if you used the
> WM_HZ_OP packet, it would implicitly re-emit 3DSTATE_WM and that the
> comment on 3DSTATE_WM::ForceThreadDispatchEnable only applied to hiz ops.
> Now that I read things a bit more carefully, it appears that the formula
> explicitly includes both 3DSTATE_WM and WM_HZ_OP so there probably is an
> interaction there.

Yeah, I don't think 3DSTATE_WM_HZ_OP implicitly emits other state
packets (just a dummy primitive with some vertices).  It sets some
state flags which are delivered via 3DSTATE_INT (internal) to other
units, which those units use when computing the formulas for
derived/calculated state.

So I think Curro's right - we may need to whack 3DSTATE_WM before
3DSTATE_WM_HZ_OP if dispatching was previously forced on. :(
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170214/b872e960/attachment.sig>


More information about the mesa-dev mailing list