<div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote">On Wed, Oct 12, 2016 at 9:01 AM, Nanley Chery <<a href="mailto:nanleychery@gmail.com" target="_blank">nanleychery@gmail.com</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Tue, Oct 11, 2016 at 06:55:53PM -0700, Jason Ekstrand wrote: > On Tue, Oct 11, 2016 at 6:16 PM, Nanley Chery <<a href="mailto:nanleychery@gmail.com">nanleychery@gmail.com</a>> wrote: > > > On Mon, Oct 10, 2016 at 06:00:49PM -0700, Jason Ekstrand wrote: > > > On Mon, Oct 10, 2016 at 2:23 PM, Nanley Chery <<a href="mailto:nanleychery@gmail.com">nanleychery@gmail.com</a>> > > wrote: > > > > > > > On Fri, Oct 07, 2016 at 09:41:14PM -0700, Jason Ekstrand wrote: > > > > > If we don't, we can end up with corruption in the portion of the > > depth > > > > > buffer that lies outside the render area when we do a HiZ resolve at > > the > > > > > end. The only reason we weren't seeing this before was that all of > > the > > > > > meta-based clears such as VkCmdClearDepthStencilImage were internally > > > > using > > > > > HiZ so the HiZ buffer never truly got out-of-sync. If the CTS ever > > > > tested > > > > > a depth upload (which doesn't care about HiZ) and then a partial > > render > > > > we > > > > > would have seen problems. Soon, we will be using blorp to do depth > > > > clears > > > > > and it won't bother with HiZ so we would get CTS regressions without > > > > this. > > > > > > > > > > > > > I understand the problem, but I think this solution unnecessarily > > > > penalizes the user's renderpass. > > > > > > > > Since depth buffer updates via vkCopy*ToImage and > > > > vkCmdClearDepthStencilImage cause the HiZ buffer to become stale, > > > > calling > > > > > > > > genX(cmd_buffer_emit_hz_op)(cmd_buffer, BLORP_HIZ_OP_HIZ_RESOLVE); > > > > > > > > at the bottom of those commands should fix the issue without the extra > > > > penalty. I'd imagine that as a prequisite, blorp would have to learn to > > > > emit enough depth stencil state for this command. > > > > > > > > > > I think that's dangerously mixing HiZ data validity models. There are 3 > > > basic aux data validity models that we've thrown around: > > > > > > 1) AUX is always correct. > > > 2) AUX is correct within a render pass and invalid outside. > > > 3) Track whether or not AUX is valid and resolve only as needed. > > > > > > > What is the definition of correct here? I'd assume you mean that the > > data matches what's in the depth buffer, but that sometimes may not be > > the case (STORE_OP_DONTCARE) yet the program behavior is correct > > nonetheless. > > > > By "correct" I mean "consistent with the depth buffer" or, more precicely, > "all well-defined pixels of the depth buffer are consistent with the HiZ > buffer". We *may* be able to avoid the depth resolve at the end if you > have STORE_OP_DONT_CARE. However, we would probably not do anything > interesting with LOAD_OP_DONT_CARE. > </div></div>With this definition of correct (accessing either buffer will give you the correct value due to their being consistent with each other), the current implementation is arguably a course-grained version of (3) (no tracking, let's call this 4) than it is (2). The HiZ buffer is only consistent with the depth buffer when a user performs an operation that likely requires it to be so. For example: * LOAD_OP_LOAD -> HiZ Resolve (consistent) * LOAD_OP_CLEAR -> No resolve, Fast Depth Clear (inconsistent) * vkCmdDraw* -> No resolve (inconsistent) * STORE_OP_STORE -> Depth Resolve (consistent) > > > Also, could you please explain where the danger comes into play? > > > > We need to have a solid mental model of when HiZ and depth are consistent. > Otherwise, we'll make mistakes, things will get inconsistent, and we'll Agreed. > have weird bugs. This bug is a good example of this. Our mental model (2) > works fine except that we were leaking garbage depth from DONT_CARE when we > have a partial areat. Just doing a HiZ resolve after a blorp clear "fixes" > the bug by making things always consistent (mental model 1). But then it As mentioned above, I'm not advocating mixing 1 and 2, but covering a missed case in 4. Whether or not that mental model is solid seems like a subjective claim. > means that we have LOAD_OP_LOAD, we're doing two HiZ resolves which we > don't want either. > I wouldn't expect Vulkan apps to submit image copies as frequently as render passes, so my thinking is that an extra HiZ resolve at the end of an Image copy should have less of an impact on FPS than performing the resolve on every clearing RP that doesn't use a full render area. I did not write the patch to test my suggestion, but I was able to get a measurable difference by forcing HiZ resolves on the triangle demo. I won't post the numbers I obtained from the questionable method of eye-balling the FPS counter (especially with the amount of variance involved), so feel free to take a look at it yourself. </blockquote><div> </div><div>I wouldn't expect copies to happen particularly frequently either and I certainly don't think we should optimize for them. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I think the temporarily minor dip in performance introduced here is a small price to pay for the removal of meta. I also think my suggestion may be a lot more work than it seemed initially, so it's likely best to revisit this later. </blockquote><div> </div><div>Agreed. I think we need to do some reading and thinking about image layouts. I believe that's how we're intended to do these sorts of things in Vulkan. My initial thought would be to have a mapping from image layout to a "use HiZ" flag that would look something like this: </div><div>GENERAL -> no DEPTH_STENCIL_ATTACHMENT_OPTIMAL -> yes </div><div>SHADER_READ_ONLY_OPTIMAL -> gen >= 8 </div><div>TRANSFER_SRC_OPTIMAL -> gen >= 8 </div><div>TRANSFER_DST_OPTIMAL -> no </div><div>And then, when they do a CmdPipelineBarrier that transitions the layout in such a way that "use HiZ" goes from yes to no we do a depth resolve and when it goes from no to yes, we do a HiZ resolve. The layout transitions are per-layer and per-miplevel so the client is required to do basically the exact same tracking we do in the GL driver today. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> This patch is: Reviewed-by: Nanley Chery <<a href="mailto:nanley.g.chery@intel.com">nanley.g.chery@intel.com</a>> </blockquote><div> </div><div>Thanks! </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div class="HOEnZb"><div class="h5"> > As I intended to say below, I don't mind moving to (1), but if that's what > we want to do we should commit to it and change rather than going with > half-and-half. > > > > > So far, our initial enabling strategy has been (2). We can move to (1) > > or > > > maybe even (3) with layout transitions, but that's not what we've done so > > > far. Your suggestion is to mix in a little of (1) because there is a bug > > > in our implementation of (2). > > > > > > I'm Ok, for gen8+ HiZ, with moving to (2) eventually since gen8 is > > capable > > ^ > > I'm not sure which model you're > > referring to here since you > > said this was our initial > > enabling model. > > > > I meant "moving to (1)". Sorry for the typo. > > > > > > -Nanley > > > > > of sampling with HiZ. (Maybe we can even move to (3).) However, that's > > > not where we are right now and I don't really want to start mixing mental > > > models. > > > > > > --Jason > > > > > > > > > > -Nanley > > > > > > > > > Signed-off-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> > > > > > --- > > > > > src/intel/vulkan/gen8_cmd_buffer.c | 12 +++++++++++- > > > > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/src/intel/vulkan/gen8_cmd_buffer.c > > > > b/src/intel/vulkan/gen8_cmd_buffer.c > > > > > index e6a3c3d..44ffcbf 100644 > > > > > --- a/src/intel/vulkan/gen8_cmd_buffer.c > > > > > +++ b/src/intel/vulkan/gen8_cmd_buffer.c > > > > > @@ -513,7 +513,17 @@ genX(cmd_buffer_emit_hz_op)(struct > > anv_cmd_buffer > > > > *cmd_buffer, > > > > > return; > > > > > break; > > > > > case BLORP_HIZ_OP_HIZ_RESOLVE: > > > > > - if (cmd_buffer->state.pass->attachments[ds].load_op != > > > > > + /* If the render area covers the entire surface *and* load_op > > is > > > > either > > > > > + * CLEAR or DONT_CARE then the previous contents of the depth > > > > buffer > > > > > + * will be entirely discarded. In this case, we can skip the > > HiZ > > > > > + * resolve. > > > > > + * > > > > > + * If the render area is not the full surface, we need to do > > > > > + * the resolve because otherwise data outside the render area > > may > > > > get > > > > > + * garbled by the resolve at the end of the render pass. > > > > > + */ > > > > > + if (full_surface_op && > > > > > + cmd_buffer->state.pass->attachments[ds].load_op != > > > > > VK_ATTACHMENT_LOAD_OP_LOAD) > > > > > return; > > > > > break; > > > > > -- > > > > > 2.5.0.400.gff86faf > > > > > > > > > > _______________________________________________ > > > > > mesa-dev mailing list > > > > > <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a> > > > > > <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a> > > > > > > </div></div></blockquote></div> </div></div>