[Mesa-dev] [PATCH 16/22] anv/hiz: Perform HiZ resolves for all partial renders

Wed Oct 12 16:01:10 UTC 2016

On Tue, Oct 11, 2016 at 06:55:53PM -0700, Jason Ekstrand wrote:
> On Tue, Oct 11, 2016 at 6:16 PM, Nanley Chery <nanleychery at gmail.com> wrote:
> 
> > On Mon, Oct 10, 2016 at 06:00:49PM -0700, Jason Ekstrand wrote:
> > > On Mon, Oct 10, 2016 at 2:23 PM, Nanley Chery <nanleychery at gmail.com>
> > wrote:
> > >
> > > > On Fri, Oct 07, 2016 at 09:41:14PM -0700, Jason Ekstrand wrote:
> > > > > If we don't, we can end up with corruption in the portion of the
> > depth
> > > > > buffer that lies outside the render area when we do a HiZ resolve at
> > the
> > > > > end.  The only reason we weren't seeing this before was that all of
> > the
> > > > > meta-based clears such as VkCmdClearDepthStencilImage were internally
> > > > using
> > > > > HiZ so the HiZ buffer never truly got out-of-sync.  If the CTS ever
> > > > tested
> > > > > a depth upload (which doesn't care about HiZ) and then a partial
> > render
> > > > we
> > > > > would have seen problems.  Soon, we will be using blorp to do depth
> > > > clears
> > > > > and it won't bother with HiZ so we would get CTS regressions without
> > > > this.
> > > > >
> > > >
> > > > I understand the problem, but I think this solution unnecessarily
> > > > penalizes the user's renderpass.
> > > >
> > > > Since depth buffer updates via vkCopy*ToImage and
> > > > vkCmdClearDepthStencilImage cause the HiZ buffer to become stale,
> > > > calling
> > > >
> > > > genX(cmd_buffer_emit_hz_op)(cmd_buffer, BLORP_HIZ_OP_HIZ_RESOLVE);
> > > >
> > > > at the bottom of those commands should fix the issue without the extra
> > > > penalty. I'd imagine that as a prequisite, blorp would have to learn to
> > > > emit enough depth stencil state for this command.
> > > >
> > >
> > > I think that's dangerously mixing HiZ data validity models.  There are 3
> > > basic aux data validity models that we've thrown around:
> > >
> > >  1) AUX is always correct.
> > >  2) AUX is correct within a render pass and invalid outside.
> > >  3) Track whether or not AUX is valid and resolve only as needed.
> > >
> >
> > What is the definition of correct here? I'd assume you mean that the
> > data matches what's in the depth buffer, but that sometimes may not be
> > the case (STORE_OP_DONTCARE) yet the program behavior is correct
> > nonetheless.
> >
> 
> By "correct" I mean "consistent with the depth buffer" or, more precicely,
> "all well-defined pixels of the depth buffer are consistent with the HiZ
> buffer".  We *may* be able to avoid the depth resolve at the end if you
> have STORE_OP_DONT_CARE.  However, we would probably not do anything
> interesting with LOAD_OP_DONT_CARE.
> 

With this definition of correct (accessing either buffer will give you
the correct value due to their being consistent with each other), the
current implementation is arguably a course-grained version of (3) (no
tracking, let's call this 4) than it is (2). The HiZ buffer is only
consistent with the depth buffer when a user performs an operation that
likely requires it to be so. For example:

* LOAD_OP_LOAD -> HiZ Resolve (consistent)
* LOAD_OP_CLEAR -> No resolve, Fast Depth Clear (inconsistent)
* vkCmdDraw* -> No resolve (inconsistent)
* STORE_OP_STORE -> Depth Resolve (consistent)

> 
> > Also, could you please explain where the danger comes into play?
> >
> 
> We need to have a solid mental model of when HiZ and depth are consistent.
> Otherwise, we'll make mistakes, things will get inconsistent, and we'll

Agreed.

> have weird bugs.  This bug is a good example of this.  Our mental model (2)
> works fine except that we were leaking garbage depth from DONT_CARE when we
> have a partial areat.  Just doing a HiZ resolve after a blorp clear "fixes"
> the bug by making things always consistent (mental model 1).  But then it

As mentioned above, I'm not advocating mixing 1 and 2, but covering a
missed case in 4. Whether or not that mental model is solid seems like
a subjective claim.

> means that we have LOAD_OP_LOAD, we're doing two HiZ resolves which we
> don't want either.
> 

I wouldn't expect Vulkan apps to submit image copies as frequently as
render passes, so my thinking is that an extra HiZ resolve at the end of
an Image copy should have less of an impact on FPS than performing the
resolve on every clearing RP that doesn't use a full render area. I
did not write the patch to test my suggestion, but I was able to get a
measurable difference by forcing HiZ resolves on the triangle demo. I
won't post the numbers I obtained from the questionable method of
eye-balling the FPS counter (especially with the amount of variance
involved), so feel free to take a look at it yourself.  

I think the temporarily minor dip in performance introduced here is a
small price to pay for the removal of meta. I also think my suggestion
may be a lot more work than it seemed initially, so it's likely best to
revisit this later.

This patch is:
Reviewed-by: Nanley Chery <nanley.g.chery at intel.com>

> As I intended to say below, I don't mind moving to (1), but if that's what
> we want to do we should commit to it and change rather than going with
> half-and-half.
> 
> 
> > > So far, our initial enabling strategy has been (2).  We can move to (1)
> > or
> > > maybe even (3) with layout transitions, but that's not what we've done so
> > > far.  Your suggestion is to mix in a little of (1) because there is a bug
> > > in our implementation of (2).
> > >
> > > I'm Ok, for gen8+ HiZ, with moving to (2) eventually since gen8 is
> > capable
> >                                          ^
> >                                          I'm not sure which model you're
> >                                          referring to here since you
> >                                          said this was our initial
> >                                          enabling model.
> >
> 
> I meant "moving to (1)".  Sorry for the typo.
> 
> 
> >
> > -Nanley
> >
> > > of sampling with HiZ.  (Maybe we can even move to (3).)  However, that's
> > > not where we are right now and I don't really want to start mixing mental
> > > models.
> > >
> > > --Jason
> > >
> > >
> > > > -Nanley
> > > >
> > > > > Signed-off-by: Jason Ekstrand <jason at jlekstrand.net>
> > > > > ---
> > > > >  src/intel/vulkan/gen8_cmd_buffer.c | 12 +++++++++++-
> > > > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/src/intel/vulkan/gen8_cmd_buffer.c
> > > > b/src/intel/vulkan/gen8_cmd_buffer.c
> > > > > index e6a3c3d..44ffcbf 100644
> > > > > --- a/src/intel/vulkan/gen8_cmd_buffer.c
> > > > > +++ b/src/intel/vulkan/gen8_cmd_buffer.c
> > > > > @@ -513,7 +513,17 @@ genX(cmd_buffer_emit_hz_op)(struct
> > anv_cmd_buffer
> > > > *cmd_buffer,
> > > > >           return;
> > > > >        break;
> > > > >     case BLORP_HIZ_OP_HIZ_RESOLVE:
> > > > > -      if (cmd_buffer->state.pass->attachments[ds].load_op !=
> > > > > +      /* If the render area covers the entire surface *and* load_op
> > is
> > > > either
> > > > > +       * CLEAR or DONT_CARE then the previous contents of the depth
> > > > buffer
> > > > > +       * will be entirely discarded.  In this case, we can skip the
> > HiZ
> > > > > +       * resolve.
> > > > > +       *
> > > > > +       * If the render area is not the full surface, we need to do
> > > > > +       * the resolve because otherwise data outside the render area
> > may
> > > > get
> > > > > +       * garbled by the resolve at the end of the render pass.
> > > > > +       */
> > > > > +      if (full_surface_op &&
> > > > > +          cmd_buffer->state.pass->attachments[ds].load_op !=
> > > > >            VK_ATTACHMENT_LOAD_OP_LOAD)
> > > > >           return;
> > > > >        break;
> > > > > --
> > > > > 2.5.0.400.gff86faf
> > > > >
> > > > > _______________________________________________
> > > > > mesa-dev mailing list
> > > > > mesa-dev at lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > >
> >