[Mesa-dev] [PATCH 16/22] anv/hiz: Perform HiZ resolves for all partial renders

Nanley Chery nanleychery at gmail.com
Fri Dec 2 22:40:58 UTC 2016


On Fri, Oct 14, 2016 at 12:46:31PM -0700, Jason Ekstrand wrote:
> On Wed, Oct 12, 2016 at 9:01 AM, Nanley Chery <nanleychery at gmail.com> wrote:
> 
> > On Tue, Oct 11, 2016 at 06:55:53PM -0700, Jason Ekstrand wrote:
> > > On Tue, Oct 11, 2016 at 6:16 PM, Nanley Chery <nanleychery at gmail.com>
> > wrote:
> > >
> > > > On Mon, Oct 10, 2016 at 06:00:49PM -0700, Jason Ekstrand wrote:
> > > > > On Mon, Oct 10, 2016 at 2:23 PM, Nanley Chery <nanleychery at gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > > On Fri, Oct 07, 2016 at 09:41:14PM -0700, Jason Ekstrand wrote:
> > > > > > > If we don't, we can end up with corruption in the portion of the
> > > > depth
> > > > > > > buffer that lies outside the render area when we do a HiZ
> > resolve at
> > > > the
> > > > > > > end.  The only reason we weren't seeing this before was that all
> > of
> > > > the
> > > > > > > meta-based clears such as VkCmdClearDepthStencilImage were
> > internally
> > > > > > using
> > > > > > > HiZ so the HiZ buffer never truly got out-of-sync.  If the CTS
> > ever
> > > > > > tested
> > > > > > > a depth upload (which doesn't care about HiZ) and then a partial
> > > > render
> > > > > > we
> > > > > > > would have seen problems.  Soon, we will be using blorp to do
> > depth
> > > > > > clears
> > > > > > > and it won't bother with HiZ so we would get CTS regressions
> > without
> > > > > > this.
> > > > > > >
> > > > > >
> > > > > > I understand the problem, but I think this solution unnecessarily
> > > > > > penalizes the user's renderpass.
> > > > > >
> > > > > > Since depth buffer updates via vkCopy*ToImage and
> > > > > > vkCmdClearDepthStencilImage cause the HiZ buffer to become stale,
> > > > > > calling
> > > > > >
> > > > > > genX(cmd_buffer_emit_hz_op)(cmd_buffer, BLORP_HIZ_OP_HIZ_RESOLVE);
> > > > > >
> > > > > > at the bottom of those commands should fix the issue without the
> > extra
> > > > > > penalty. I'd imagine that as a prequisite, blorp would have to
> > learn to
> > > > > > emit enough depth stencil state for this command.
> > > > > >
> > > > >
> > > > > I think that's dangerously mixing HiZ data validity models.  There
> > are 3
> > > > > basic aux data validity models that we've thrown around:
> > > > >
> > > > >  1) AUX is always correct.
> > > > >  2) AUX is correct within a render pass and invalid outside.
> > > > >  3) Track whether or not AUX is valid and resolve only as needed.
> > > > >
> > > >
> > > > What is the definition of correct here? I'd assume you mean that the
> > > > data matches what's in the depth buffer, but that sometimes may not be
> > > > the case (STORE_OP_DONTCARE) yet the program behavior is correct
> > > > nonetheless.
> > > >
> > >
> > > By "correct" I mean "consistent with the depth buffer" or, more
> > precicely,
> > > "all well-defined pixels of the depth buffer are consistent with the HiZ
> > > buffer".  We *may* be able to avoid the depth resolve at the end if you
> > > have STORE_OP_DONT_CARE.  However, we would probably not do anything
> > > interesting with LOAD_OP_DONT_CARE.
> > >
> >
> >
> > With this definition of correct (accessing either buffer will give you
> > the correct value due to their being consistent with each other), the
> > current implementation is arguably a course-grained version of (3) (no
> > tracking, let's call this 4) than it is (2). The HiZ buffer is only
> > consistent with the depth buffer when a user performs an operation that
> > likely requires it to be so. For example:
> >
> > * LOAD_OP_LOAD -> HiZ Resolve (consistent)
> > * LOAD_OP_CLEAR -> No resolve, Fast Depth Clear (inconsistent)
> > * vkCmdDraw* -> No resolve (inconsistent)
> > * STORE_OP_STORE -> Depth Resolve (consistent)
> >
> > >
> > > > Also, could you please explain where the danger comes into play?
> > > >
> > >
> > > We need to have a solid mental model of when HiZ and depth are
> > consistent.
> > > Otherwise, we'll make mistakes, things will get inconsistent, and we'll
> >
> > Agreed.
> >
> > > have weird bugs.  This bug is a good example of this.  Our mental model
> > (2)
> > > works fine except that we were leaking garbage depth from DONT_CARE when
> > we
> > > have a partial areat.  Just doing a HiZ resolve after a blorp clear
> > "fixes"
> > > the bug by making things always consistent (mental model 1).  But then it
> >
> > As mentioned above, I'm not advocating mixing 1 and 2, but covering a
> > missed case in 4. Whether or not that mental model is solid seems like
> > a subjective claim.
> >
> > > means that we have LOAD_OP_LOAD, we're doing two HiZ resolves which we
> > > don't want either.
> > >
> >
> > I wouldn't expect Vulkan apps to submit image copies as frequently as
> > render passes, so my thinking is that an extra HiZ resolve at the end of
> > an Image copy should have less of an impact on FPS than performing the
> > resolve on every clearing RP that doesn't use a full render area. I
> > did not write the patch to test my suggestion, but I was able to get a
> > measurable difference by forcing HiZ resolves on the triangle demo. I
> > won't post the numbers I obtained from the questionable method of
> > eye-balling the FPS counter (especially with the amount of variance
> > involved), so feel free to take a look at it yourself.
> >
> 
> I wouldn't expect copies to happen particularly frequently either and I
> certainly don't think we should optimize for them.
> 
> 
> > I think the temporarily minor dip in performance introduced here is a
> > small price to pay for the removal of meta. I also think my suggestion
> > may be a lot more work than it seemed initially, so it's likely best to
> > revisit this later.
> >
> 
> Agreed.  I think we need to do some reading and thinking about image
> layouts.  I believe that's how we're intended to do these sorts of things
> in Vulkan.  My initial thought would be to have a mapping from image layout
> to a "use HiZ" flag that would look something like this:
> 
> GENERAL -> no
> DEPTH_STENCIL_ATTACHMENT_OPTIMAL -> yes
> SHADER_READ_ONLY_OPTIMAL -> gen >= 8
> TRANSFER_SRC_OPTIMAL -> gen >= 8
> TRANSFER_DST_OPTIMAL -> no
> 
> And then, when they do a CmdPipelineBarrier that transitions the layout in
> such a way that "use HiZ" goes from yes to no we do a depth resolve and
> when it goes from no to yes, we do a HiZ resolve.  The layout transitions
> are per-layer and per-miplevel so the client is required to do basically
> the exact same tracking we do in the GL driver today.
> 
> 

This seems like a good plan.

-Nanley


More information about the mesa-dev mailing list