[Mesa-dev] [PATCH V2 10/11] genX/cmd_buffer: Enable fast depth clears

Nanley Chery nanleychery at gmail.com
Wed Oct 5 23:30:51 UTC 2016


On Tue, Sep 27, 2016 at 02:47:22PM -0700, Nanley Chery wrote:
> On Tue, Sep 27, 2016 at 11:00:21AM -0700, Chad Versace wrote:
> > On Mon 26 Sep 2016, Nanley Chery wrote:
> > > From: Nanley Chery <nanleychery at gmail.com>
> > > 
> > > Provides an FPS increase of ~30% on the Sascha triangle and multisampling
> > > demos.
> > > 
> > > Clears that happen within a render pass via vkCmdClearAttachments are safe
> > > even if the clear color changes. This is because the meta implementation does
> > > not use LOAD_OP_CLEAR which avoids any conflicts with 3DSTATE_CLEAR_PARAMS.
> > > 
> > > Signed-off-by: Nanley Chery <nanley.g.chery at intel.com>
> > > Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
> > > 
> > > ---
> > > 
> > > v2. Update granularity comment for accuracy
> > > 
> > >  src/intel/vulkan/anv_pass.c        | 13 +++++++++++++
> > >  src/intel/vulkan/gen8_cmd_buffer.c |  6 ++++++
> > >  src/intel/vulkan/genX_cmd_buffer.c |  4 +---
> > >  3 files changed, 20 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/src/intel/vulkan/anv_pass.c b/src/intel/vulkan/anv_pass.c
> > > index 69c3c7e..595c2ea 100644
> > > --- a/src/intel/vulkan/anv_pass.c
> > > +++ b/src/intel/vulkan/anv_pass.c
> > > @@ -155,5 +155,18 @@ void anv_GetRenderAreaGranularity(
> > >      VkRenderPass                                renderPass,
> > >      VkExtent2D*                                 pGranularity)
> > >  {
> > > +   ANV_FROM_HANDLE(anv_render_pass, pass, renderPass);
> > > +
> > > +   /* This granularity satisfies HiZ fast clear alignment requirements
> > > +    * for all sample counts.
> > > +    */
> > > +   for (unsigned i = 0; i < pass->subpass_count; ++i) {
> > > +      if (pass->subpasses[i].depth_stencil_attachment !=
> > > +          VK_ATTACHMENT_UNUSED) {
> > > +         *pGranularity = (VkExtent2D) { .width = 8, .height = 4 };
> > > +         return;
> > > +      }
> > > +   }
> > > +
> > >     *pGranularity = (VkExtent2D) { 1, 1 };
> > >  }
> > > diff --git a/src/intel/vulkan/gen8_cmd_buffer.c b/src/intel/vulkan/gen8_cmd_buffer.c
> > > index 14e6a7b..96e972c 100644
> > > --- a/src/intel/vulkan/gen8_cmd_buffer.c
> > > +++ b/src/intel/vulkan/gen8_cmd_buffer.c
> > > @@ -479,6 +479,12 @@ genX(cmd_buffer_do_hz_op)(struct anv_cmd_buffer *cmd_buffer,
> > >               cmd_state->render_area.extent.height % px_dim.h)
> > >              return;
> > >        }
> > > +
> > > +      anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_CLEAR_PARAMS), cp) {
> > > +         cp.DepthClearValueValid = true;
> > > +         cp.DepthClearValue =
> > > +            cmd_buffer->state.attachments[ds].clear_value.depthStencil.depth;
> > > +      }
> > >        break;
> > >     case BLORP_HIZ_OP_DEPTH_RESOLVE:
> > >        if (cmd_buffer->state.pass->attachments[ds].store_op !=
> > > diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c
> > > index 2cb1539..290fefc 100644
> > > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > > @@ -1320,9 +1320,6 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer *cmd_buffer)
> > >     } else {
> > >        anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_STENCIL_BUFFER), sb);
> > >     }
> > > -
> > > -   /* Clear the clear params. */
> > > -   anv_batch_emit(&cmd_buffer->batch, GENX(3DSTATE_CLEAR_PARAMS), cp);
> > 
> > We may need to preserve emission of 3DSTATE_CLEAR_PARAMS here. Two reasons:
> > 
> >     Reason 1. If hiz is enabled in the 3DSTATE_DEPTH_BUFFER, and the hiz
> >        surface has some bits in the clear state, and 3DSTATE_CLEAR_PARAMS.DepthClearValueValid is 0,
> >        and we emit a draw call, what does the hardware do when it
> >        accesses a cleard pixel? I don't want to find out.
> > 
> 
> Good point.
> 

I thought about this some more and came to the conclusion that this
shouldn't be a problem. In this V2, one of the two initialization
requirements of the clear value (quoted below) is performed whenever
a pixel is cleared in the depth buffer.

> >     Reason 2. The PRM says we have to (though, to be honest, I don't trust the PRM's logic).
> > 
> >         From the Skylake PRM >> Vol7: 3D-Media-GPGUP >> Section: Hierarchical Depth Buffer:
> >         | 
> >         |  If HiZ is enabled, you must initialize the clear value by either:
> >         | 
> >         |     1. Perform a depth clear pass to initialize the clear value.
> >         |     2. Send a 3dstate_clear_params packet with valid = 1.
> >         | 
> >         |  Without one of these events, context switching will fail, as it will try
> >         |  to save off a clear value even though no valid clear value has been set.
> >         |  When context restore happens, HW will restore an uninitialized clear value.
> >     
> >         Even though the hardware docs claim we need 3DSTATE_CLEAR_PARAMS when hiz is
> >         enabled, the docs are vague about the consequences. Does context switching
> >         really fail, as claimed by #1? Or does context switching actually succeed, but
> >         context restore gives us an invalid clear value (which doesn't hurt us), as
> >         claimed by #2? Oh hw docs... :/
> > 
> 
> I didn't trust the logic as well, but I agree. It's good to keep the
> diff as small as reasonably possible. 
> 

Like Jason mentioned in another email I think the PRM is saying that the
saved value isn't trustworthy and so saving off the current context will
fail in a sense. #2 discusses context restoration, implying that a
context save did in fact occur and another context was switched to.

> > As a consequence of that reasoning, we should set 3DSTATE_CLEAR_PARAMS.DepthClearValueValid = 1 
> > whenever hiz is enabled, even if we don't care about the actual clear value.

Testing shows that we cannot emit clear_params packets back-to-back,
so unconditionally emitting an arbitrary value here will override the
actual clear value we want to set later. Trying to emit the actual clear
value here causes segfaults that seem to stem from an interaction with
secondary command buffers. While I could try to add conditions to work
around this, my current understanding that changing this code is not
necessary.

> 
> In the V3, I plan to emit that packet once at device initialization time
> HSW+, and to always emit it (in the expected location) for IVB/BYT. Only
> the latter platforms have the restriction that it must always be
> programmed with the other depth/stencil commands.


More information about the mesa-dev mailing list