[Mesa-dev] [PATCH] i965: Whack UAV bit when FS discards and there are no color writes.

Jason Ekstrand jason at jlekstrand.net
Fri Mar 25 16:44:55 UTC 2016


On Thu, Mar 24, 2016 at 7:45 PM, Francisco Jerez <currojerez at riseup.net>
wrote:

> Francisco Jerez <currojerez at riseup.net> writes:
>
> > Jason Ekstrand <jason at jlekstrand.net> writes:
> >
> >> On Thu, Mar 24, 2016 at 5:50 PM, Kenneth Graunke <kenneth at whitecape.org
> >
> >> wrote:
> >>
> >>> dEQP-GLES31.functional.fbo.no_attachments.* draws a quad with no
> >>> framebuffer attachments, using a shader that discards based on
> >>> gl_FragCoord.  It uses occlusion queries to inspect whether pixels
> >>> are rendered or not.
> >>>
> >>> Unfortunately, the hardware is not dispatching any pixel shaders,
> >>> so discards never happen, and the full quad of pixels increments
> >>> PS_DEPTH_COUNT, making the occlusion query results bogus.
> >>>
> >>> To understand why, we have to delve into the WM_INT internal
> >>> signalling mechanism's formulas.
> >>>
> >>> The "WM_INT::Pixel Shader Kill Pixel" signal is defined as:
> >>>
> >>>     3DSTATE_WM::ForceKillPixel == ON ||
> >>>     (3DSTATE_WM::ForceKillPixel != Off &&
> >>>      !WM_INT::WM_HZ_OP &&
> >>>      3DSTATE_WM::EDSC_Mode != PREPS &&
> >>>      (WM_INT::Depth Write Enable || WM_INT::Stencil Write Enable) &&
> >>>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>      (3DSTATE_PS_EXTRA::PixelShaderKillsPixels ||
> >>>       3DSTATE_PS_EXTRA:: oMask Present to RenderTarget ||
> >>>       3DSTATE_PS_BLEND::AlphaToCoverageEnable ||
> >>>       3DSTATE_PS_BLEND::AlphaTestEnable ||
> >>>       3DSTATE_WM_CHROMAKEY::ChromaKeyKillEnable))
> >>>
> >>> Because there is no depth or stencil buffer, writes to those buffers
> >>> are disabled.  So the highlighted condition is false, making the whole
> >>> "Kill Pixel" condition false.  This then feeds into the following
> >>> "WM_INT::ThreadDispatchEnable" condition:
> >>>
> >>>     3DSTATE_WM::ForceThreadDispatch != OFF &&
> >>>     !WM_INT::WM_HZ_OP &&
> >>>     3DSTATE_PS_EXTRA::PixelShaderValid &&
> >>>     (3DSTATE_PS_EXTRA::PixelShaderHasUAV ||
> >>>      WM_INT::Pixel Shader Kill Pixel ||
> >>>      WM_INT::RTIndependentRasterizationEnable ||
> >>>      (!3DSTATE_PS_EXTRA::PixelShaderDoesNotWriteRT &&
> >>>       3DSTATE_PS_BLEND::HasWriteableRT) ||
> >>>      (WM_INT::Pixel Shader Computed Depth Mode != PSCDEPTH_OFF &&
> >>>       (WM_INT::Depth Test Enable || WM_INT::Depth Write Enable)) ||
> >>>      (3DSTATE_PS_EXTRA::Computed Stencil && WM_INT::Stencil Test
> Enable) ||
> >>>      (3DSTATE_WM::EDSC_Mode == 1 && (WM_INT::Depth Test Enable ||
> >>>                                      WM_INT::Depth Write Enable ||
> >>>                                      WM_INT::Stencil Test Enable)))
> >>>
> >>> Given that there's no depth/stencil testing, no writeable render
> target,
> >>> and the hardware thinks kill pixel doesn't happen, all of these
> >>> conditions are false.  We have to whack some bit to make PS invocations
> >>> happen.  There are many options.
> >>>
> >>> Curro suggested using the UAV bit.  There's some precedence in doing
> >>> that - we set it for fragment shaders that do SSBO/image/atomic writes
> >>> when no color buffer writes are enabled.  We can simply include discard
> >>> here too.
> >>>
> >>> Fixes 64 dEQP-GLES31.functional.fbo.no_attachments.* tests.
> >>>
> >>> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> >>> ---
> >>>  src/mesa/drivers/dri/i965/gen8_ps_state.c | 4 ++--
> >>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c
> >>> b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> >>> index b9a06e7..cda8159 100644
> >>> --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c
> >>> +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c
> >>> @@ -93,8 +93,8 @@ gen8_upload_ps_extra(struct brw_context *brw,
> >>>      *
> >>>      * BRW_NEW_FS_PROG_DATA | BRW_NEW_FRAGMENT_PROGRAM | _NEW_BUFFERS |
> >>> _NEW_COLOR
> >>>      */
> >>> -   if (_mesa_active_fragment_shader_has_side_effects(&brw->ctx) &&
> >>> -       !brw_color_buffer_write_enabled(brw))
> >>> +   if ((_mesa_active_fragment_shader_has_side_effects(ctx) ||
> >>> +        prog_data->uses_kill) && !brw_color_buffer_write_enabled(brw))
> >>>
> >>
> >> Whould you mind adding a short comment as to why uses_kill is in here?
> You
> >> don't need the full explination above, bot something along the lines of
> >>
> >> "Gen8 hardware tries to compute ThreadDispatchEnable for us but doesn't
> >> take into account KillPixels when no depth or stencil writes are
> enabled.
> >> In order for occlusion queries to work correctly with no attachments, we
> >> need to force-enable here."
> >>
> > That sounds good to me.
> >
> >> You can come up with your own text if you'd like.
> >>
> >> One other thought: Do we know if HAS_UAV has any perf impact other than
> >> actually running the shader?  I would guess not, but if it does we may
> want
> >> to make the condition a bit more specifi.
> >>
> >
> > It does, it causes an implicit DC flush at the bottom of the pipeline,
> > see the comment right on top of this code for a more detailed
> > explanation.  It would definitely be possible to make the check more
> > specific by e.g. setting the bit only if both stencil and depth writes
> > are disabled (in addition to color buffer writes), but Ken had some
> > concerns about the additional state dependencies that would introduce
> > when I brought up the same question earlier today in the office.  Anyway
> > this only gets turned on when no color buffer writes are enabled so it
> > doesn't seem like a huge deal, but it would of course be nice to do the
> > additional checks.
> >
> Let me elaborate a bit more why I don't think setting the has-UAV bit in
> this case is a big deal: On the one hand, unless the application is
> using shader images, SSBOs or atomics the implicit DC flush shouldn't be
> too expensive because the DC is likely to be already close to being
> coherent with the L3, so the flush shouldn't do much work -- However if
> the shader *is* using images, SSBOs or atomics the has_side_effects
> check in the same conditional will evaluate to true and the has-UAV bit
> will be set (or not) regardless of the additional uses_kill condition.
> On the other hand, because we don't use the UAV coherency mechanism (and
> are unlikely to use it anytime soon), the DC flush is asynchronous and
> doesn't lead to a pipeline stall on subsequent geometry processing, so
> it shouldn't hurt much anyway.
>

Right, but what about the dept-stencil-only-with-a-discard case?  In that
case, we don't need to force pixel shader dispatch because KillsPixels will
enable it and we would really rather not have a DC flush after every
3DPRIMITIVE.


> BTW, if you're curious about the workings of the UAV coherency
> mechanism, I wrote most of what I could find out in the commit message
> of 5346c1167064d6429c6338974c6342f8346fd34b.  I haven't found any decent
> documentation about it anywhere else...
>
> >> Other than that, looks good to me.
> >> --Jason
> >>
> >>
> >>>        dw1 |= GEN8_PSX_SHADER_HAS_UAV;
> >>>
> >>>     if (prog_data->computed_stencil) {
> >>> --
> >>> 2.7.4
> >>>
> >>> _______________________________________________
> >>> mesa-dev mailing list
> >>> mesa-dev at lists.freedesktop.org
> >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>>
> >> _______________________________________________
> >> mesa-dev mailing list
> >> mesa-dev at lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160325/2a224755/attachment-0001.html>


More information about the mesa-dev mailing list