<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Aug 31, 2016 at 8:29 PM, Nanley Chery <span dir="ltr"><<a href="mailto:nanleychery@gmail.com" target="_blank">nanleychery@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">From: Jason Ekstrand <<a href="mailto:jason.ekstrand@intel.com">jason.ekstrand@intel.com</a>><br></blockquote><div><br></div>First off, this is your patch not mine. The patch of mine you based this on was little more than a skeleton that demonstrated how to use PIPE_CONTROL. All of the interesting stuff in here is yours.<br><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Nanley Chery:<br>
(rebase)<br>
- Resolve conflicts with the new anv_batch_emit macro<br>
(amend)<br>
- Update commit title<br>
- Combine all HZ operations into one function<br>
- Add code for performing HiZ resolve operations<br>
- Add proper stencil and multisampling support<br>
- Set the proper clear rectangles<br>
- Add required cases for aborting an HZ operation<br>
<br>
Signed-off-by: Nanley Chery <<a href="mailto:nanley.g.chery@intel.com">nanley.g.chery@intel.com</a>><br>
---<br>
src/intel/vulkan/anv_genX.h | 3 +<br>
src/intel/vulkan/anv_private.h | 6 ++<br>
src/intel/vulkan/gen7_cmd_<wbr>buffer.c | 5 ++<br>
src/intel/vulkan/gen8_cmd_<wbr>buffer.c | 124 ++++++++++++++++++++++++++++++<wbr>+++++++<br>
4 files changed, 138 insertions(+)<br>
<br>
diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h<br>
index cf5a232..16de990 100644<br>
--- a/src/intel/vulkan/anv_genX.h<br>
+++ b/src/intel/vulkan/anv_genX.h<br>
@@ -54,6 +54,9 @@ void genX(cmd_buffer_flush_dynamic_<wbr>state)(struct anv_cmd_buffer *cmd_buffer);<br>
<br>
void genX(cmd_buffer_flush_compute_<wbr>state)(struct anv_cmd_buffer *cmd_buffer);<br>
<br>
+void genX(cmd_buffer_do_hz_op)(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
+ enum anv_hz_op op);<br>
+<br>
VkResult<br>
genX(graphics_pipeline_create)<wbr>(VkDevice _device,<br>
struct anv_pipeline_cache *cache,<br>
diff --git a/src/intel/vulkan/anv_<wbr>private.h b/src/intel/vulkan/anv_<wbr>private.h<br>
index 5718a19..40325fd 100644<br>
--- a/src/intel/vulkan/anv_<wbr>private.h<br>
+++ b/src/intel/vulkan/anv_<wbr>private.h<br>
@@ -1401,6 +1401,12 @@ anv_cmd_buffer_get_depth_<wbr>stencil_view(const struct anv_cmd_buffer *cmd_buffer);<br>
<br>
void anv_cmd_buffer_dump(struct anv_cmd_buffer *cmd_buffer);<br>
<br>
+enum anv_hz_op {<br>
+ ANV_HZ_OP_CLEAR,<br>
+ ANV_HZ_OP_HIZ_RESOLVE,<br>
+ ANV_HZ_OP_DEPTH_RESOLVE,<br>
+};<br></blockquote><div><br></div><div>Now that blorp is in its own folder, we could use the blorp_hiz_op enum instead of rolling our own. That'll make it easier to add gen7 support.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+<br>
struct anv_fence {<br>
struct anv_bo bo;<br>
struct drm_i915_gem_execbuffer2 execbuf;<br>
diff --git a/src/intel/vulkan/gen7_cmd_<wbr>buffer.c b/src/intel/vulkan/gen7_cmd_<wbr>buffer.c<br>
index 61778aa..a057a04 100644<br>
--- a/src/intel/vulkan/gen7_cmd_<wbr>buffer.c<br>
+++ b/src/intel/vulkan/gen7_cmd_<wbr>buffer.c<br>
@@ -323,6 +323,11 @@ genX(cmd_buffer_flush_dynamic_<wbr>state)(struct anv_cmd_buffer *cmd_buffer)<br>
cmd_buffer->state.dirty = 0;<br>
}<br>
<br>
+void<br>
+genX(cmd_buffer_do_hz_op)(<wbr>struct anv_cmd_buffer *cmd_buffer, enum anv_hz_op op)<br>
+{<br>
+}<br>
+<br>
void genX(CmdSetEvent)(<br>
VkCommandBuffer commandBuffer,<br>
VkEvent event,<br>
diff --git a/src/intel/vulkan/gen8_cmd_<wbr>buffer.c b/src/intel/vulkan/gen8_cmd_<wbr>buffer.c<br>
index e22b4e2..4f27350 100644<br>
--- a/src/intel/vulkan/gen8_cmd_<wbr>buffer.c<br>
+++ b/src/intel/vulkan/gen8_cmd_<wbr>buffer.c<br>
@@ -399,6 +399,130 @@ genX(cmd_buffer_flush_compute_<wbr>state)(struct anv_cmd_buffer *cmd_buffer)<br>
genX(cmd_buffer_apply_pipe_<wbr>flushes)(cmd_buffer);<br>
}<br>
<br>
+<br>
+/**<br>
+ * Emit the HZ_OP packet in the sequence specified by the BDW PRM section<br>
+ * entitled: "Optimized Depth Buffer Clear and/or Stencil Buffer Clear."<br>
+ */<br>
+void<br>
+genX(cmd_buffer_do_hz_op)(<wbr>struct anv_cmd_buffer *cmd_buffer, enum anv_hz_op op)<br>
+{<br>
+ struct anv_cmd_state *cmd_state = &cmd_buffer->state;<br>
+ const struct anv_image_view *iview =<br>
+ anv_cmd_buffer_get_depth_<wbr>stencil_view(cmd_buffer);<br>
+<br>
+ if (iview == NULL || !anv_image_has_hiz(iview-><wbr>image))<br>
+ return;<br></blockquote><div><br></div><div>This looks like something that would be better as an assert. Silently doing nothing is probably fine for resolves. For clears on the other hand, it means silently *not* clearing which would be bad. <br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+<br>
+ const uint32_t ds = cmd_state->subpass->depth_<wbr>stencil_attachment;<br>
+ const bool full_surface_op =<br>
+ cmd_state->render_area.extent.<wbr>width == iview->extent.width &&<br>
+ cmd_state->render_area.extent.<wbr>height == iview->extent.height;<br></blockquote><div><br></div><div>I think you also need render_area.offset == 0 checks?<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+<br>
+ /* Validate that we can perform the HZ operation and that it's necessary. */<br>
+ switch (op) {<br>
+ case ANV_HZ_OP_CLEAR:<br>
+ if (cmd_buffer->state.pass-><wbr>attachments[ds].load_op !=<br>
+ VK_ATTACHMENT_LOAD_OP_CLEAR)<br>
+ return;<br>
+<br>
+ /* Apply alignment restrictions. For a sample count of 16, the formulas<br>
+ * reduce to identity and indicate that no alignment is required.<br>
+ */<br></blockquote><div><br></div><div>I think you're being a bit over-restrictive here. From the Sky Lake PRM Vol. 7 "Hierarchical Depth Buffer" (p. 626 in my pdf):<br><br>"The minimum granularity of clear is one pixel, but all samples of the pixel must be cleared. Clearing partial samples of a pixel is not supported. If a newly allocated depth buffer is not padded to an integer multiple of 8x4 pixels, and if the first operation on the depth buffer does not clear the entire width and height of the surface, then first a HiZ ambiguate must be done on the portions of the depth buffer that are not cleared. If the depth buffer clear operation does clear the entire width and height of the surface, then the “full surface clear” bit in 3DSTATE_WM_OP<br>must be set to 1."<br><br></div><div>In other words, the granularity of the HiZ operation is 1 pixel. However, you need to ensure that the the HiZ buffer is valid for all 8x4 blocks or else strange things may happen. We already do this with a memset at BindImageMemory time so our HiZ buffer should always contain valid data.<br><br></div><div>In the Broadwell PRM, same section, it does list that alignment requirement but says that it only applies to D16_UNORM surfaces. For all other surface formats, Broadwell appears to follow the Sky Lake rules where, as long as the HiZ buffer contains valid data, everything goes.<br><br></div><div>I think we need a crucible test for partial (maybe even multisampled) depth clears.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ if (!full_surface_op && iview->image->samples < 16) {<br></blockquote><div><br></div><div>If my understanding of Sky Lake is correct, the samples < 16 doesn't matter because the alignment restrictions don't apply to Sky Lake anyway.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ uint32_t align_w = 1;<br>
+ uint32_t align_h = 1;<br>
+<br>
+ if (iview->image->samples > 1) {<br>
+ isl_msaa_interleaved_scale_px_<wbr>to_sa(iview->image->samples,<br>
+ &align_w, &align_h);<br>
+ }<br>
+<br>
+ align_w = 8 / align_w;<br>
+ align_h = 4 / align_h;<br>
+<br>
+ if (cmd_state->render_area.<wbr>offset.x % align_w ||<br>
+ cmd_state->render_area.offset.<wbr>y % align_h ||<br>
+ cmd_state->render_area.extent.<wbr>width % align_w ||<br>
+ cmd_state->render_area.extent.<wbr>height % align_h)<br>
+ return;<br>
+ }<br>
+ break;<br>
+ case ANV_HZ_OP_DEPTH_RESOLVE:<br>
+ if (cmd_buffer->state.pass-><wbr>attachments[ds].store_op !=<br>
+ VK_ATTACHMENT_STORE_OP_STORE)<br>
+ return;<br>
+ break;<br>
+ case ANV_HZ_OP_HIZ_RESOLVE:<br>
+ if (cmd_buffer->state.pass-><wbr>attachments[ds].load_op !=<br>
+ VK_ATTACHMENT_LOAD_OP_LOAD)<br>
+ return;<br>
+ break;<br>
+ }<br>
+<br>
+ anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(3DSTATE_WM_HZ_OP), hzp) {<br>
+ switch (op) {<br>
+ case ANV_HZ_OP_CLEAR:<br>
+ hzp.StencilBufferClearEnable = VK_IMAGE_ASPECT_STENCIL_BIT &<br>
+ cmd_state->attachments[ds].<wbr>pending_clear_aspects;<br>
+ hzp.DepthBufferClearEnable = VK_IMAGE_ASPECT_DEPTH_BIT &<br>
+ cmd_state->attachments[ds].<wbr>pending_clear_aspects;<br>
+ hzp.<wbr>FullSurfaceDepthandStencilClea<wbr>r = full_surface_op;<br>
+ hzp.StencilClearValue = 0xff &<br>
+ cmd_state->attachments[ds].<wbr>clear_value.depthStencil.<wbr>stencil;<br>
+<br>
+ /* Mark aspects as cleared */<br>
+ cmd_state->attachments[ds].<wbr>pending_clear_aspects = 0;<br>
+ break;<br>
+ case ANV_HZ_OP_DEPTH_RESOLVE:<br>
+ hzp.DepthBufferResolveEnable = true;<br>
+ break;<br>
+ case ANV_HZ_OP_HIZ_RESOLVE:<br>
+ hzp.<wbr>HierarchicalDepthBufferResolve<wbr>Enable = true;<br>
+ break;<br>
+ }<br>
+<br>
+ /* The depth resolve rectangle must match the size of the previous clear<br>
+ * rectangle.<br>
+ *<br>
+ * The HiZ resolve rectangle is specified as needing to be the<br>
+ * size of the full RT and aligned to 8x4, these requirements are in<br>
+ * conflict if the RT extent is not a multiple of 8x4. Testing shows<br>
+ * that setting the rectangle to match the render area works just fine.<br>
+ *<br>
+ * In a manner similar to i965, we'd like to diverge from the PRMs here<br>
+ * to reduce the number of HiZ blocks written to.<br></blockquote><div><br></div><div>Not aligning to 8x4 shouldn't affect the number of HiZ blocks written to. Maybe I'm confused :/<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ */<br>
+ hzp.ClearRectangleXMin = anv_minify(cmd_state->render_<wbr>area.offset.x,<br>
+ iview->base_mip);<br>
+ hzp.ClearRectangleYMin = anv_minify(cmd_state->render_<wbr>area.offset.y,<br>
+ iview->base_mip);<br>
+ hzp.ClearRectangleXMax = anv_minify(cmd_state->render_<wbr>area.offset.x +<br>
+ cmd_state->render_area.extent.<wbr>width,<br>
+ iview->base_mip);<br>
+ hzp.ClearRectangleYMax = anv_minify(cmd_state->render_<wbr>area.offset.y +<br>
+ cmd_state->render_area.extent.<wbr>height,<br>
+ iview->base_mip);<br></blockquote><div><br></div><div>This isn't correct. The size of the render area should already take into account the minification. When you render into a non-zero LOD of an image, the size of your framebuffer is the size of that LOD and the render area must be a subset of that.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+<br>
+ /* Due to a hardware issue, this bit MBZ */<br>
+ hzp.ScissorRectangleEnable = false;<br>
+ hzp.NumberofMultisamples = ffs(iview->image->samples) - 1;<br>
+ hzp.SampleMask = 0xFFFF;<br>
+ }<br>
+<br>
+ anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(PIPE_CONTROL), pc) {<br>
+ pc.PostSyncOperation = WriteImmediateData;<br>
+ pc.Address =<br>
+ (struct anv_address){ &cmd_buffer->device-><wbr>workaround_bo, 0 };<br>
+ }<br>
+<br>
+ anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(3DSTATE_WM_HZ_OP), hzp);<br>
+<br>
+ /* TODO: Determine if a DepthFlush and DepthStall is really necessary for<br>
+ * non-full_surface_op clears.<br>
+ */<br>
+}<br>
+<br>
void genX(CmdSetEvent)(<br>
VkCommandBuffer commandBuffer,<br>
VkEvent _event,<br>
<span class=""><font color="#888888">--<br>
2.9.3<br>
<br>
</font></span></blockquote></div><br></div></div>