[Mesa-dev] [PATCH] i965: Whack UAV bit when FS discards and there are no color writes.

Francisco Jerez currojerez at riseup.net
Fri Mar 25 02:45:25 UTC 2016


Francisco Jerez <currojerez at riseup.net> writes:

> Jason Ekstrand <jason at jlekstrand.net> writes:
>
>> On Thu, Mar 24, 2016 at 5:50 PM, Kenneth Graunke <kenneth at whitecape.org>
>> wrote:
>>
>>> dEQP-GLES31.functional.fbo.no_attachments.* draws a quad with no
>>> framebuffer attachments, using a shader that discards based on
>>> gl_FragCoord.  It uses occlusion queries to inspect whether pixels
>>> are rendered or not.
>>>
>>> Unfortunately, the hardware is not dispatching any pixel shaders,
>>> so discards never happen, and the full quad of pixels increments
>>> PS_DEPTH_COUNT, making the occlusion query results bogus.
>>>
>>> To understand why, we have to delve into the WM_INT internal
>>> signalling mechanism's formulas.
>>>
>>> The "WM_INT::Pixel Shader Kill Pixel" signal is defined as:
>>>
>>>     3DSTATE_WM::ForceKillPixel == ON ||
>>>     (3DSTATE_WM::ForceKillPixel != Off &&
>>>      !WM_INT::WM_HZ_OP &&
>>>      3DSTATE_WM::EDSC_Mode != PREPS &&
>>>      (WM_INT::Depth Write Enable || WM_INT::Stencil Write Enable) &&
>>>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>      (3DSTATE_PS_EXTRA::PixelShaderKillsPixels ||
>>>       3DSTATE_PS_EXTRA:: oMask Present to RenderTarget ||
>>>       3DSTATE_PS_BLEND::AlphaToCoverageEnable ||
>>>       3DSTATE_PS_BLEND::AlphaTestEnable ||
>>>       3DSTATE_WM_CHROMAKEY::ChromaKeyKillEnable))
>>>
>>> Because there is no depth or stencil buffer, writes to those buffers
>>> are disabled.  So the highlighted condition is false, making the whole
>>> "Kill Pixel" condition false.  This then feeds into the following
>>> "WM_INT::ThreadDispatchEnable" condition:
>>>
>>>     3DSTATE_WM::ForceThreadDispatch != OFF &&
>>>     !WM_INT::WM_HZ_OP &&
>>>     3DSTATE_PS_EXTRA::PixelShaderValid &&
>>>     (3DSTATE_PS_EXTRA::PixelShaderHasUAV ||
>>>      WM_INT::Pixel Shader Kill Pixel ||
>>>      WM_INT::RTIndependentRasterizationEnable ||
>>>      (!3DSTATE_PS_EXTRA::PixelShaderDoesNotWriteRT &&
>>>       3DSTATE_PS_BLEND::HasWriteableRT) ||
>>>      (WM_INT::Pixel Shader Computed Depth Mode != PSCDEPTH_OFF &&
>>>       (WM_INT::Depth Test Enable || WM_INT::Depth Write Enable)) ||
>>>      (3DSTATE_PS_EXTRA::Computed Stencil && WM_INT::Stencil Test Enable) ||
>>>      (3DSTATE_WM::EDSC_Mode == 1 && (WM_INT::Depth Test Enable ||
>>>                                      WM_INT::Depth Write Enable ||
>>>                                      WM_INT::Stencil Test Enable)))
>>>
>>> Given that there's no depth/stencil testing, no writeable render target,
>>> and the hardware thinks kill pixel doesn't happen, all of these
>>> conditions are false.  We have to whack some bit to make PS invocations
>>> happen.  There are many options.
>>>
>>> Curro suggested using the UAV bit.  There's some precedence in doing
>>> that - we set it for fragment shaders that do SSBO/image/atomic writes
>>> when no color buffer writes are enabled.  We can simply include discard
>>> here too.
>>>
>>> Fixes 64 dEQP-GLES31.functional.fbo.no_attachments.* tests.
>>>
>>> Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
>>> ---
>>>  src/mesa/drivers/dri/i965/gen8_ps_state.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c
>>> b/src/mesa/drivers/dri/i965/gen8_ps_state.c
>>> index b9a06e7..cda8159 100644
>>> --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c
>>> +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c
>>> @@ -93,8 +93,8 @@ gen8_upload_ps_extra(struct brw_context *brw,
>>>      *
>>>      * BRW_NEW_FS_PROG_DATA | BRW_NEW_FRAGMENT_PROGRAM | _NEW_BUFFERS |
>>> _NEW_COLOR
>>>      */
>>> -   if (_mesa_active_fragment_shader_has_side_effects(&brw->ctx) &&
>>> -       !brw_color_buffer_write_enabled(brw))
>>> +   if ((_mesa_active_fragment_shader_has_side_effects(ctx) ||
>>> +        prog_data->uses_kill) && !brw_color_buffer_write_enabled(brw))
>>>
>>
>> Whould you mind adding a short comment as to why uses_kill is in here?  You
>> don't need the full explination above, bot something along the lines of
>>
>> "Gen8 hardware tries to compute ThreadDispatchEnable for us but doesn't
>> take into account KillPixels when no depth or stencil writes are enabled.
>> In order for occlusion queries to work correctly with no attachments, we
>> need to force-enable here."
>>
> That sounds good to me.
>
>> You can come up with your own text if you'd like.
>>
>> One other thought: Do we know if HAS_UAV has any perf impact other than
>> actually running the shader?  I would guess not, but if it does we may want
>> to make the condition a bit more specifi.
>>
>
> It does, it causes an implicit DC flush at the bottom of the pipeline,
> see the comment right on top of this code for a more detailed
> explanation.  It would definitely be possible to make the check more
> specific by e.g. setting the bit only if both stencil and depth writes
> are disabled (in addition to color buffer writes), but Ken had some
> concerns about the additional state dependencies that would introduce
> when I brought up the same question earlier today in the office.  Anyway
> this only gets turned on when no color buffer writes are enabled so it
> doesn't seem like a huge deal, but it would of course be nice to do the
> additional checks.
>
Let me elaborate a bit more why I don't think setting the has-UAV bit in
this case is a big deal: On the one hand, unless the application is
using shader images, SSBOs or atomics the implicit DC flush shouldn't be
too expensive because the DC is likely to be already close to being
coherent with the L3, so the flush shouldn't do much work -- However if
the shader *is* using images, SSBOs or atomics the has_side_effects
check in the same conditional will evaluate to true and the has-UAV bit
will be set (or not) regardless of the additional uses_kill condition.
On the other hand, because we don't use the UAV coherency mechanism (and
are unlikely to use it anytime soon), the DC flush is asynchronous and
doesn't lead to a pipeline stall on subsequent geometry processing, so
it shouldn't hurt much anyway.

BTW, if you're curious about the workings of the UAV coherency
mechanism, I wrote most of what I could find out in the commit message
of 5346c1167064d6429c6338974c6342f8346fd34b.  I haven't found any decent
documentation about it anywhere else...

>> Other than that, looks good to me.
>> --Jason
>>
>>
>>>        dw1 |= GEN8_PSX_SHADER_HAS_UAV;
>>>
>>>     if (prog_data->computed_stencil) {
>>> --
>>> 2.7.4
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160324/5949f5e5/attachment.sig>


More information about the mesa-dev mailing list