[Mesa-dev] [PATCH 16/22] anv/hiz: Perform HiZ resolves for all partial renders
Jason Ekstrand
jason at jlekstrand.net
Fri Oct 14 19:46:31 UTC 2016
On Wed, Oct 12, 2016 at 9:01 AM, Nanley Chery <nanleychery at gmail.com> wrote:
> On Tue, Oct 11, 2016 at 06:55:53PM -0700, Jason Ekstrand wrote:
> > On Tue, Oct 11, 2016 at 6:16 PM, Nanley Chery <nanleychery at gmail.com>
> wrote:
> >
> > > On Mon, Oct 10, 2016 at 06:00:49PM -0700, Jason Ekstrand wrote:
> > > > On Mon, Oct 10, 2016 at 2:23 PM, Nanley Chery <nanleychery at gmail.com
> >
> > > wrote:
> > > >
> > > > > On Fri, Oct 07, 2016 at 09:41:14PM -0700, Jason Ekstrand wrote:
> > > > > > If we don't, we can end up with corruption in the portion of the
> > > depth
> > > > > > buffer that lies outside the render area when we do a HiZ
> resolve at
> > > the
> > > > > > end. The only reason we weren't seeing this before was that all
> of
> > > the
> > > > > > meta-based clears such as VkCmdClearDepthStencilImage were
> internally
> > > > > using
> > > > > > HiZ so the HiZ buffer never truly got out-of-sync. If the CTS
> ever
> > > > > tested
> > > > > > a depth upload (which doesn't care about HiZ) and then a partial
> > > render
> > > > > we
> > > > > > would have seen problems. Soon, we will be using blorp to do
> depth
> > > > > clears
> > > > > > and it won't bother with HiZ so we would get CTS regressions
> without
> > > > > this.
> > > > > >
> > > > >
> > > > > I understand the problem, but I think this solution unnecessarily
> > > > > penalizes the user's renderpass.
> > > > >
> > > > > Since depth buffer updates via vkCopy*ToImage and
> > > > > vkCmdClearDepthStencilImage cause the HiZ buffer to become stale,
> > > > > calling
> > > > >
> > > > > genX(cmd_buffer_emit_hz_op)(cmd_buffer, BLORP_HIZ_OP_HIZ_RESOLVE);
> > > > >
> > > > > at the bottom of those commands should fix the issue without the
> extra
> > > > > penalty. I'd imagine that as a prequisite, blorp would have to
> learn to
> > > > > emit enough depth stencil state for this command.
> > > > >
> > > >
> > > > I think that's dangerously mixing HiZ data validity models. There
> are 3
> > > > basic aux data validity models that we've thrown around:
> > > >
> > > > 1) AUX is always correct.
> > > > 2) AUX is correct within a render pass and invalid outside.
> > > > 3) Track whether or not AUX is valid and resolve only as needed.
> > > >
> > >
> > > What is the definition of correct here? I'd assume you mean that the
> > > data matches what's in the depth buffer, but that sometimes may not be
> > > the case (STORE_OP_DONTCARE) yet the program behavior is correct
> > > nonetheless.
> > >
> >
> > By "correct" I mean "consistent with the depth buffer" or, more
> precicely,
> > "all well-defined pixels of the depth buffer are consistent with the HiZ
> > buffer". We *may* be able to avoid the depth resolve at the end if you
> > have STORE_OP_DONT_CARE. However, we would probably not do anything
> > interesting with LOAD_OP_DONT_CARE.
> >
>
>
> With this definition of correct (accessing either buffer will give you
> the correct value due to their being consistent with each other), the
> current implementation is arguably a course-grained version of (3) (no
> tracking, let's call this 4) than it is (2). The HiZ buffer is only
> consistent with the depth buffer when a user performs an operation that
> likely requires it to be so. For example:
>
> * LOAD_OP_LOAD -> HiZ Resolve (consistent)
> * LOAD_OP_CLEAR -> No resolve, Fast Depth Clear (inconsistent)
> * vkCmdDraw* -> No resolve (inconsistent)
> * STORE_OP_STORE -> Depth Resolve (consistent)
>
> >
> > > Also, could you please explain where the danger comes into play?
> > >
> >
> > We need to have a solid mental model of when HiZ and depth are
> consistent.
> > Otherwise, we'll make mistakes, things will get inconsistent, and we'll
>
> Agreed.
>
> > have weird bugs. This bug is a good example of this. Our mental model
> (2)
> > works fine except that we were leaking garbage depth from DONT_CARE when
> we
> > have a partial areat. Just doing a HiZ resolve after a blorp clear
> "fixes"
> > the bug by making things always consistent (mental model 1). But then it
>
> As mentioned above, I'm not advocating mixing 1 and 2, but covering a
> missed case in 4. Whether or not that mental model is solid seems like
> a subjective claim.
>
> > means that we have LOAD_OP_LOAD, we're doing two HiZ resolves which we
> > don't want either.
> >
>
> I wouldn't expect Vulkan apps to submit image copies as frequently as
> render passes, so my thinking is that an extra HiZ resolve at the end of
> an Image copy should have less of an impact on FPS than performing the
> resolve on every clearing RP that doesn't use a full render area. I
> did not write the patch to test my suggestion, but I was able to get a
> measurable difference by forcing HiZ resolves on the triangle demo. I
> won't post the numbers I obtained from the questionable method of
> eye-balling the FPS counter (especially with the amount of variance
> involved), so feel free to take a look at it yourself.
>
I wouldn't expect copies to happen particularly frequently either and I
certainly don't think we should optimize for them.
> I think the temporarily minor dip in performance introduced here is a
> small price to pay for the removal of meta. I also think my suggestion
> may be a lot more work than it seemed initially, so it's likely best to
> revisit this later.
>
Agreed. I think we need to do some reading and thinking about image
layouts. I believe that's how we're intended to do these sorts of things
in Vulkan. My initial thought would be to have a mapping from image layout
to a "use HiZ" flag that would look something like this:
GENERAL -> no
DEPTH_STENCIL_ATTACHMENT_OPTIMAL -> yes
SHADER_READ_ONLY_OPTIMAL -> gen >= 8
TRANSFER_SRC_OPTIMAL -> gen >= 8
TRANSFER_DST_OPTIMAL -> no
And then, when they do a CmdPipelineBarrier that transitions the layout in
such a way that "use HiZ" goes from yes to no we do a depth resolve and
when it goes from no to yes, we do a HiZ resolve. The layout transitions
are per-layer and per-miplevel so the client is required to do basically
the exact same tracking we do in the GL driver today.
> This patch is:
> Reviewed-by: Nanley Chery <nanley.g.chery at intel.com>
>
Thanks!
>
> > As I intended to say below, I don't mind moving to (1), but if that's
> what
> > we want to do we should commit to it and change rather than going with
> > half-and-half.
> >
> >
> > > > So far, our initial enabling strategy has been (2). We can move to
> (1)
> > > or
> > > > maybe even (3) with layout transitions, but that's not what we've
> done so
> > > > far. Your suggestion is to mix in a little of (1) because there is
> a bug
> > > > in our implementation of (2).
> > > >
> > > > I'm Ok, for gen8+ HiZ, with moving to (2) eventually since gen8 is
> > > capable
> > > ^
> > > I'm not sure which model
> you're
> > > referring to here since you
> > > said this was our initial
> > > enabling model.
> > >
> >
> > I meant "moving to (1)". Sorry for the typo.
> >
> >
> > >
> > > -Nanley
> > >
> > > > of sampling with HiZ. (Maybe we can even move to (3).) However,
> that's
> > > > not where we are right now and I don't really want to start mixing
> mental
> > > > models.
> > > >
> > > > --Jason
> > > >
> > > >
> > > > > -Nanley
> > > > >
> > > > > > Signed-off-by: Jason Ekstrand <jason at jlekstrand.net>
> > > > > > ---
> > > > > > src/intel/vulkan/gen8_cmd_buffer.c | 12 +++++++++++-
> > > > > > 1 file changed, 11 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/src/intel/vulkan/gen8_cmd_buffer.c
> > > > > b/src/intel/vulkan/gen8_cmd_buffer.c
> > > > > > index e6a3c3d..44ffcbf 100644
> > > > > > --- a/src/intel/vulkan/gen8_cmd_buffer.c
> > > > > > +++ b/src/intel/vulkan/gen8_cmd_buffer.c
> > > > > > @@ -513,7 +513,17 @@ genX(cmd_buffer_emit_hz_op)(struct
> > > anv_cmd_buffer
> > > > > *cmd_buffer,
> > > > > > return;
> > > > > > break;
> > > > > > case BLORP_HIZ_OP_HIZ_RESOLVE:
> > > > > > - if (cmd_buffer->state.pass->attachments[ds].load_op !=
> > > > > > + /* If the render area covers the entire surface *and*
> load_op
> > > is
> > > > > either
> > > > > > + * CLEAR or DONT_CARE then the previous contents of the
> depth
> > > > > buffer
> > > > > > + * will be entirely discarded. In this case, we can skip
> the
> > > HiZ
> > > > > > + * resolve.
> > > > > > + *
> > > > > > + * If the render area is not the full surface, we need to
> do
> > > > > > + * the resolve because otherwise data outside the render
> area
> > > may
> > > > > get
> > > > > > + * garbled by the resolve at the end of the render pass.
> > > > > > + */
> > > > > > + if (full_surface_op &&
> > > > > > + cmd_buffer->state.pass->attachments[ds].load_op !=
> > > > > > VK_ATTACHMENT_LOAD_OP_LOAD)
> > > > > > return;
> > > > > > break;
> > > > > > --
> > > > > > 2.5.0.400.gff86faf
> > > > > >
> > > > > > _______________________________________________
> > > > > > mesa-dev mailing list
> > > > > > mesa-dev at lists.freedesktop.org
> > > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > > >
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20161014/29f51d77/attachment-0001.html>
More information about the mesa-dev
mailing list