<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Nov 9, 2016 at 5:09 PM, Nanley Chery <span dir="ltr"><<a href="mailto:nanleychery@gmail.com" target="_blank">nanleychery@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Sat, Oct 22, 2016 at 10:50:31AM -0700, Jason Ekstrand wrote:<br>
</span><div><div class="h5">> This series does some fairly major surgery on color attachment surface<br>
> state allocation and fill-out in the Intel Vulkan driver. This is in<br>
> preparation for doing color compression, fast-clears, and HiZ-capable input<br>
> attachments. Naturally, as with everything else I've done in the last 2<br>
> months, it also involves some non-trivial blorp work.<br>
><br>
> Let's start off at the beginning... For a variety of reasons, we can't<br>
> really know 100% of the details of an attachment's surface state at any<br>
> other places than vkCmdBeginRenderPass and vkCmdNextSubpss. The same<br>
> applies for depth buffers if you consider 3DSTATE_DEPTH_BUFFER and friends<br>
> to be the depth and stencil buffer's "surface state". That's a fairly<br>
> strong statement, but there are a couple of reasons for this:<br>
><br>
> 1) In order for fast-clears to work, the surface state has to contain the<br>
> clear color. (This is it's own packet for HiZ but not for color.) We<br>
> don't know the clear value until BeginRenderPass. This means we can't<br>
> fully fill out the surface state in vkCmdCreateImageView.<br>
><br>
> 2) The Vulkan spec requires that you be able to call vkBeginCommandBuffer<br>
> on a secondary command buffer with USAGE_RENDER_PASS_CONTINUE_BIT set<br>
> but with a null framebuffer. In this case, the secondary is supposed<br>
> to inherit the framebuffer from the primary. (This is not something we<br>
> have properly implemented until now.) This means that anything that is<br>
> callable from a render-pass-continuing secondary command buffer has to<br>
> be able to operate without knowing any surface details that aren't part<br>
> of the VkRenderPass object. Basically, all you know is the Vulkan<br>
> format (not the isl format) and the sample count.<br>
><br>
> Between the two of those, about the only two entrypoints left at which we<br>
> actually know surface details are vkCmdBeginRenderPass and vkCmdNextSubpass<br>
> so we have to figure out how to do everything there. As it turns out, this<br>
> works out surprisingly well. The format and the sample count turn out to<br>
> be exactly the data we actually need in order to do all of our pipeline<br>
> programming. The only hard part is refactoring things so that it pulls the<br>
> data from the render pass instead of the framebuffer. There are a number<br>
> of places where we were grabbing the image view for an attachment because<br>
> we either wanted to shove something into blorp or because we wanted the<br>
> format and we were lazy.<br>
><br>
> The approach taken in this patch series is the following:<br>
><br>
> 1) Instead of allocating render target surface states in vkCreateImageView,<br>
> we allocate them as part of render pass setup in vkCmdBeginRenderPass.<br>
> All of the surface states we will ever need (including a null surface<br>
> state) are allocated up-front out of a single contiguous block.<br>
><br>
> 2) For secondary command buffers with USAGE_RENDER_PASS_CONTINUE_BIT set,<br>
> we allocate storage for all of the surface states but don't actually<br>
> fill them out. In the secondary command buffer, all binding tables<br>
> refer to these surface states rather than the ones in the primary.<br>
><br>
> 3) A blorp entrypoint is added that performs a clear operation without<br>
> touching the depth/stencil buffer state and with a color attachment<br>
> binding table explicitly provided by the caller. This means that even<br>
> our blorp clears are using the surface states allocated in<br>
> vkCmdBeginRenderPass. Unfortunately, this turned out to be more work<br>
> than expected because I had to add vertex shader support to blorp along<br>
> the way.<br>
><br>
> 4) Here's the tricky bit. When vkCmdExecuteCommands is called during a<br>
> render pass, we use transform feedback (yeah, crazy) to copy the block<br>
> of surface states from the primary into the secondary right before<br>
> executing the secondary.<br>
<br>
</div></div>Could we perform a CPU memcpy at this stage?<span class="HOEnZb"><font color="#888888"><br></font></span></blockquote><div><br></div><div>We can but only if the secondary is created with the SIMULTANEOUS_USE_BIT unset. If SIMULTANEOUS_USE_BIT is set, then it may be used in multiple primaries at the same time. In this case, since we don't know the order they will be submitted to the GPU, we have to do the copy at the last minute. It may be a bit more performant to do the CPU memcpy in that case.<br><br>I didn't do that yet because I wanted to let it bake for a while with just the GPU memcpy. The CPU memcpy is guaranteed to be reliable but the GPU memcpy isn't. By leaving it as just GPU memcpy, we'll get better tests coverage of the crazy path. If we find that it's causing problems, doing a CPU copy is a nice little optimization to have in our back pockets.<br><br></div><div>--Jason<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="HOEnZb"><font color="#888888">
- Nanley<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
><br>
> It's kind of a crazy scheme but I like the end result quite a bit.<br>
><br>
> Cc: Kristian Høgsberg Kristensen <<a href="mailto:krh@bitplanet.net">krh@bitplanet.net</a>><br>
> Cc: Chad Versace <<a href="mailto:chadversary@chromium.org">chadversary@chromium.org</a>><br>
> Cc: Nanley Chery <<a href="mailto:nanley.g.chery@intel.com">nanley.g.chery@intel.com</a>><br>
> Cc: Topi Pohjolainen <<a href="mailto:topi.pohjolainen@intel.com">topi.pohjolainen@intel.com</a>><br>
><br>
> Jason Ekstrand (25):<br>
> intel/isl: Add some basic info about RENDER_SURFACE_STATE to<br>
> isl_device<br>
> intel/genxml: Add SO_WRITE_OFFSET registers for gen7-9<br>
> anv: Add a helper for doing buffer copies with nothing but VF and SOL.<br>
> anv/cmd_buffer: Use the surface state alloc helper in<br>
> null_surface_state<br>
> anv/cmd_buffer: Expose add_surface_state_reloc as an inline helper<br>
> anv: Rework the way render target surfaces are allocated<br>
> anv/cmd_buffer: Stop relying on the framebuffer for 3DSTATE_SF on gen7<br>
> intel/genxml: Make some VS/GS fields consistent across gens<br>
> intel/blorp: Make the number of samples an explicit parameter<br>
> intel/blorp: Add a shader type to make keys more unique<br>
> intel/blorp: Remove NIR support for uniforms<br>
> intel/blorp: Rename compile_nir_shader to compile_fs<br>
> intel/blorp: Rework our usage of ralloc when compiling shaders<br>
> intel/blorp: Handle NIR clear inputs the same way as blit inputs<br>
> blorp/exec: Use uint32_t for copying varying data<br>
> intel/blorp: Use an actual chunk of vertex buffer for the VUE header<br>
> intel/blorp: Add support for vertex shaders<br>
> intel/blorp: Add capability to use pre-baked binding tables<br>
> intel/blorp: Add a clear_attachments entrypoint<br>
> anv: Bring back anv_cmd_buffer_emit_state_<wbr>base_address<br>
> anv/blorp: Break the guts of alloc_binding_table into a shared helper<br>
> anv/blorp: Use the new clear_attachments entrypoint for attachment<br>
> clears<br>
> anv: Set framebuffer to NULL in secondary command buffers<br>
> Allocate a null state whenever there is depth/stencil<br>
> anv/blorp: Handle VK_ATTACHMENT_UNUSED in CmdClearAttachments<br>
><br>
> src/intel/blorp/blorp.c | 74 ++++----<br>
> src/intel/blorp/blorp.h | 11 ++<br>
> src/intel/blorp/blorp_blit.c | 34 ++--<br>
> src/intel/blorp/blorp_clear.c | 189 +++++++++++++++++--<br>
> src/intel/blorp/blorp_genX_<wbr>exec.h | 196 ++++++++++---------<br>
> src/intel/blorp/blorp_priv.h | 52 ++++-<br>
> src/intel/genxml/gen6.xml | 6 +-<br>
> src/intel/genxml/gen7.xml | 22 ++-<br>
> src/intel/genxml/gen75.xml | 22 ++-<br>
> src/intel/genxml/gen8.xml | 16 ++<br>
> src/intel/genxml/gen9.xml | 16 ++<br>
> src/intel/isl/isl.c | 19 ++<br>
> src/intel/isl/isl.h | 11 ++<br>
> src/intel/vulkan/Makefile.<wbr>sources | 4 +<br>
> src/intel/vulkan/anv_batch_<wbr>chain.c | 4 +-<br>
> src/intel/vulkan/anv_blorp.c | 125 +++++++++----<br>
> src/intel/vulkan/anv_cmd_<wbr>buffer.c | 74 ++------<br>
> src/intel/vulkan/anv_genX.h | 5 +<br>
> src/intel/vulkan/anv_image.c | 22 ---<br>
> src/intel/vulkan/anv_private.h | 42 ++++-<br>
> src/intel/vulkan/gen7_cmd_<wbr>buffer.c | 43 +++--<br>
> src/intel/vulkan/gen7_<wbr>pipeline.c | 6 +-<br>
> src/intel/vulkan/genX_blorp_<wbr>exec.c | 18 +-<br>
> src/intel/vulkan/genX_cmd_<wbr>buffer.c | 229 +++++++++++++++++------<br>
> src/intel/vulkan/genX_gpu_<wbr>memcpy.c | 223 ++++++++++++++++++++++<br>
> src/mesa/drivers/dri/i965/brw_<wbr>wm_surface_state.c | 56 +++---<br>
> 26 files changed, 1105 insertions(+), 414 deletions(-)<br>
> create mode 100644 src/intel/vulkan/genX_gpu_<wbr>memcpy.c<br>
><br>
> --<br>
> 2.5.0.400.gff86faf<br>
><br>
</div></div><div class="HOEnZb"><div class="h5">> ______________________________<wbr>_________________<br>
> mesa-dev mailing list<br>
> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</div></div></blockquote></div><br></div></div>