<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Sep 2, 2016 at 3:16 PM, Chad Versace <span dir="ltr"><<a href="mailto:chadversary@chromium.org" target="_blank">chadversary@chromium.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Wed 31 Aug 2016, Nanley Chery wrote:<br>
> From: Chad Versace <<a href="mailto:chad.versace@intel.com">chad.versace@intel.com</a>><br>
><br>
> Nanley Chery:<br>
> (rebase)<br>
> - Resolve conflicts with new anv_batch_emit macro<br>
> (amend)<br>
> - Remove wip! tag and handle a QPitch TODO<br>
> - Emit 3DSTATE_HIER_DEPTH_BUFFER on pre-BDW systems<br>
> - Only use HiZ for single-subpass renderpasses<br>
> - Emit the HiZ instruction before the stencil instruction to follow the<br>
> optimized clear sequence specified in the PRMs<br>
> - Don't modify clear params<br>
> - Enable resolves when a HiZ buffer is used to ensure depth buffer validity<br>
><br>
> Provides an FPS increase of ~15% on the Sascha triangle and multisampling<br>
> demos.<br>
><br>
> Signed-off-by: Nanley Chery <<a href="mailto:nanley.g.chery@intel.com">nanley.g.chery@intel.com</a>><br>
> ---<br>
> src/intel/vulkan/gen8_cmd_<wbr>buffer.c | 4 ++++<br>
> src/intel/vulkan/genX_cmd_<wbr>buffer.c | 41 ++++++++++++++++++++++++++++++<wbr>++++----<br>
> 2 files changed, 41 insertions(+), 4 deletions(-)<br>
><br>
> diff --git a/src/intel/vulkan/gen8_cmd_<wbr>buffer.c b/src/intel/vulkan/gen8_cmd_<wbr>buffer.c<br>
> index 4f27350..7f65fe2 100644<br>
> --- a/src/intel/vulkan/gen8_cmd_<wbr>buffer.c<br>
> +++ b/src/intel/vulkan/gen8_cmd_<wbr>buffer.c<br>
> @@ -414,6 +414,10 @@ genX(cmd_buffer_do_hz_op)(<wbr>struct anv_cmd_buffer *cmd_buffer, enum anv_hz_op op)<br>
> if (iview == NULL || !anv_image_has_hiz(iview-><wbr>image))<br>
> return;<br>
><br>
> + /* FIXME: Implement multi-subpass HiZ */<br>
> + if (cmd_buffer->state.pass-><wbr>subpass_count > 1)<br>
> + return;<br>
> +<br>
> const uint32_t ds = cmd_state->subpass->depth_<wbr>stencil_attachment;<br>
> const bool full_surface_op =<br>
> cmd_state->render_area.extent.<wbr>width == iview->extent.width &&<br>
> diff --git a/src/intel/vulkan/genX_cmd_<wbr>buffer.c b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> index 95ed5f2..349d2a4 100644<br>
> --- a/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> +++ b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> @@ -1040,6 +1040,7 @@ cmd_buffer_emit_depth_stencil(<wbr>struct anv_cmd_buffer *cmd_buffer)<br>
> anv_cmd_buffer_get_depth_<wbr>stencil_view(cmd_buffer);<br>
> const struct anv_image *image = iview ? iview->image : NULL;<br>
> const bool has_depth = image && (image->aspects & VK_IMAGE_ASPECT_DEPTH_BIT);<br>
> + const bool has_hiz = image != NULL && anv_image_has_hiz(image);<br>
> const bool has_stencil =<br>
> image && (image->aspects & VK_IMAGE_ASPECT_STENCIL_BIT);<br>
<br>
><br>
> @@ -1052,7 +1053,12 @@ cmd_buffer_emit_depth_stencil(<wbr>struct anv_cmd_buffer *cmd_buffer)<br>
> db.SurfaceType = SURFTYPE_2D;<br>
> db.DepthWriteEnable = true;<br>
> db.StencilWriteEnable = has_stencil;<br>
> - db.<wbr>HierarchicalDepthBufferEnable = false;<br>
> +<br>
> + if (cmd_buffer->state.pass-><wbr>subpass_count == 1) {<br>
> + db.<wbr>HierarchicalDepthBufferEnable = has_hiz;<br>
> + } else {<br>
> + anv_finishme("Multiple-subpass HiZ not implemented");<br>
> + }<br>
><br>
> db.SurfaceFormat = isl_surf_get_depth_format(&<wbr>device->isl_dev,<br>
> &image->depth_surface.isl);<br>
> @@ -1104,6 +1110,34 @@ cmd_buffer_emit_depth_stencil(<wbr>struct anv_cmd_buffer *cmd_buffer)<br>
> }<br>
> }<br>
><br>
> + if (has_hiz) {<br>
<br>
</div></div>Note: This codepath is hit sometimes when<br>
3DSTATE_DEPTH_BUFFER.<wbr>HierarchicalDepthBufferEnable is false.<br>
Specifically, when subpass_count > 1. It's weird, but I doubt it causes<br>
any harm. After all, all the surface data programmed by<br>
3DSTATE_HIER_BUFFER is valid here regardless of the value of<br>
HierarchicalDepthBufferEnable.<br>
<div><div class="h5"><br>
> + anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(3DSTATE_HIER_DEPTH_<wbr>BUFFER), hdb) {<br>
> + hdb.<wbr>HierarchicalDepthBufferObjectC<wbr>ontrolState = GENX(MOCS);<br>
> + hdb.SurfacePitch = image->hiz_surface.isl.row_<wbr>pitch - 1;<br>
> + hdb.SurfaceBaseAddress = (struct anv_address) {<br>
> + .bo = image->bo,<br>
> + .offset = image->offset + image->hiz_surface.offset,<br>
> + };<br>
> +#if GEN_GEN >= 8<br>
> + /* From the SKL PRM Vol2a:<br>
> + *<br>
> + * The interpretation of this field is dependent on Surface Type<br>
> + * as follows:<br>
> + * - SURFTYPE_1D: distance in pixels between array slices<br>
> + * - SURFTYPE_2D/CUBE: distance in rows between array slices<br>
> + * - SURFTYPE_3D: distance in rows between R - slices<br>
> + *<br>
> + * ISL implements HiZ surfaces for 1D depth buffers as 2D. Therefore<br>
> + * the depth buffer needs to be checked for the dimension.<br>
> + */<br>
> + hdb.SurfaceQPitch =<br>
> + image->depth_surface.isl.dim == ISL_SURF_DIM_1D ?<br>
> + isl_surf_get_array_pitch_el(&<wbr>image->hiz_surface.isl) >> 2 :<br>
> + isl_surf_get_array_pitch_el_<wbr>rows(&image->hiz_surface.isl) >> 2;<br>
> +#endif<br>
> + }<br>
> + }<br>
<br>
</div></div>Here I expected to see:<br>
<br>
} else {<br>
<span class=""> /* Disable hierarchial depth buffers. */<br>
</span> anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(3DSTATE_HIER_DEPTH_<wbr>BUFFER), hz);<br>
}<br>
<br>
I'm not insisting that you *should* do that; just that I expected it.<br>
<br>
Why? If the current depth surface has no hiz surface, then any<br>
previously emitted 3DSTATE_HIER_DEPTH_BUFFER remains active. That<br>
instruction persists across render passes. If the HiZ surface referenced<br>
by the stale 3DSTATE_HIER_DEPTH_BUFFER no longer exists, what happens?<br>
Does the GPU attempt to prefetch that surface before observing<br>
3DSTATE_DEPTH_BUFFER.<wbr>HierarchicalDepthBufferEnable is false, therefore<br>
performing an illegal memory access? Does the GPU first observe that<br>
3DSTATE_DEPTH_BUFFER.<wbr>HierarchicalDepthBufferEnable is false, and so<br>
ignore the active (but stale) 3DSTATE_HIER_DEPTH_BUFFER instruction,<br>
doesn't fetch the stale surface memory, and all is good? Or something<br>
completely different that we haven't thought of?<br>
<br>
I don't know what the GPU does in that case. So it makes me feel safer<br>
if we emit a zero-filled 3DSTATE_HIER_DEPTH_BUFFER when HiZ is disabled.<br>
After all, that's what i965 does. Maybe it's unneeded. But if<br>
a zero-filled 3DSTATE_HIER_DEPTH_BUFFER *is* needed, the only symptom<br>
would be rare and difficult-to-reproduce rendering bugs and GPU hangs.<br>
And I don't want to risk that.<span class=""><br></span></blockquote><div><br></div><div>I completely agree. On older hardware, DEPTH_BUFFER, HIER_DEPTH_BUFFER, and CLEAR_PARAMS were required to be emitted as a block. I think newer hardware may be a bit more forgiving, but I share Chad's reservations and would rather we just disable it.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
> /* Emit 3DSTATE_STENCIL_BUFFER */<br>
> if (has_stencil) {<br>
> anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(3DSTATE_STENCIL_BUFFER), sb) {<br>
> @@ -1126,9 +1160,6 @@ cmd_buffer_emit_depth_stencil(<wbr>struct anv_cmd_buffer *cmd_buffer)<br>
> anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(3DSTATE_STENCIL_BUFFER), sb);<br>
> }<br>
><br>
> - /* Disable hierarchial depth buffers. */<br>
> - anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(3DSTATE_HIER_DEPTH_<wbr>BUFFER), hz);<br>
> -<br>
> /* Clear the clear params. */<br>
> anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(3DSTATE_CLEAR_PARAMS), cp);<br>
<br>
> }<br>
> @@ -1164,6 +1195,7 @@ void genX(CmdBeginRenderPass)(<br>
> genX(flush_pipeline_select_3d)<wbr>(cmd_buffer);<br>
><br>
> genX(cmd_buffer_set_subpass)(<wbr>cmd_buffer, pass->subpasses);<br>
> + genX(cmd_buffer_do_hz_op)(cmd_<wbr>buffer, ANV_HZ_OP_HIZ_RESOLVE);<br>
> anv_cmd_buffer_clear_subpass(<wbr>cmd_buffer);<br>
> }<br>
><br>
> @@ -1185,6 +1217,7 @@ void genX(CmdEndRenderPass)(<br>
> {<br>
> ANV_FROM_HANDLE(anv_cmd_<wbr>buffer, cmd_buffer, commandBuffer);<br>
><br>
> + genX(cmd_buffer_do_hz_op)(cmd_<wbr>buffer, ANV_HZ_OP_DEPTH_RESOLVE);<br>
> anv_cmd_buffer_resolve_<wbr>subpass(cmd_buffer);<br>
><br>
> #ifndef NDEBUG<br>
> --<br>
> 2.9.3<br>
><br>
</span>> ______________________________<wbr>_________________<br>
> mesa-dev mailing list<br>
> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
______________________________<wbr>_________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</blockquote></div><br></div></div>