[Mesa-dev] [PATCH v2 3/4] anv/pass: Record required pipe flushes
Nanley Chery
nanleychery at gmail.com
Thu Apr 6 23:13:15 UTC 2017
On Thu, Apr 06, 2017 at 11:46:17AM -0700, Jason Ekstrand wrote:
> On Thu, Mar 30, 2017 at 1:59 PM, Nanley Chery <nanleychery at gmail.com> wrote:
>
> > On Wed, Mar 15, 2017 at 12:01:16PM -0700, Jason Ekstrand wrote:
> > > ---
> > > src/intel/vulkan/anv_pass.c | 89 ++++++++++++++++++++++++++++++
> > ++++++++++++
> > > src/intel/vulkan/anv_private.h | 2 +
> > > 2 files changed, 91 insertions(+)
> > >
> > > diff --git a/src/intel/vulkan/anv_pass.c b/src/intel/vulkan/anv_pass.c
> > > index 8d1768d..0f71fb3 100644
> > > --- a/src/intel/vulkan/anv_pass.c
> > > +++ b/src/intel/vulkan/anv_pass.c
> > > @@ -98,6 +98,7 @@ VkResult anv_CreateRenderPass(
> > > if (pass->subpass_attachments == NULL)
> > > goto fail_subpass_usages;
> > >
> > > + bool has_color = false, has_depth = false, has_input = false;
> > > p = pass->subpass_attachments;
> > > for (uint32_t i = 0; i < pCreateInfo->subpassCount; i++) {
> > > const VkSubpassDescription *desc = &pCreateInfo->pSubpasses[i];
> > > @@ -115,6 +116,7 @@ VkResult anv_CreateRenderPass(
> > > uint32_t a = desc->pInputAttachments[j].attachment;
> > > subpass->input_attachments[j] = desc->pInputAttachments[j];
> > > if (a != VK_ATTACHMENT_UNUSED) {
> > > + has_input = true;
> > > pass->attachments[a].usage |= VK_IMAGE_USAGE_INPUT_
> > ATTACHMENT_BIT;
> > > pass->attachments[a].subpass_usage[i] |=
> > ANV_SUBPASS_USAGE_INPUT;
> > > pass->attachments[a].last_subpass_idx = i;
> > > @@ -134,6 +136,7 @@ VkResult anv_CreateRenderPass(
> > > uint32_t a = desc->pColorAttachments[j].attachment;
> > > subpass->color_attachments[j] = desc->pColorAttachments[j];
> > > if (a != VK_ATTACHMENT_UNUSED) {
> > > + has_color = true;
> > > pass->attachments[a].usage |= VK_IMAGE_USAGE_COLOR_
> > ATTACHMENT_BIT;
> > > pass->attachments[a].subpass_usage[i] |=
> > ANV_SUBPASS_USAGE_DRAW;
> > > pass->attachments[a].last_subpass_idx = i;
> > > @@ -170,6 +173,7 @@ VkResult anv_CreateRenderPass(
> > > *p++ = subpass->depth_stencil_attachment =
> > > *desc->pDepthStencilAttachment;
> > > if (a != VK_ATTACHMENT_UNUSED) {
> > > + has_depth = true;
> > > pass->attachments[a].usage |=
> > > VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT;
> > > pass->attachments[a].subpass_usage[i] |=
> > ANV_SUBPASS_USAGE_DRAW;
> > > @@ -181,10 +185,94 @@ VkResult anv_CreateRenderPass(
> > > }
> > > }
> > >
> > > + pass->subpass_flushes =
> > > + vk_zalloc2(&device->alloc, pAllocator,
> > > + (pass->subpass_count + 1) *
> > sizeof(*pass->subpass_flushes),
> > > + 8, VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
> > > + if (pass->subpass_flushes == NULL)
> > > + goto fail_subpass_attachments;
> > > +
> >
> > I think we should allocate this memory differently. We currently have
> > two memory allocation methods in this function:
> > 1. Allocate memory per data structure. This makes pointer
> > assignments simple and using gotos improves error handling in this
> > case.
> > 2. Allocate one big block of memory and assign pointers for certain
> > offsets into this block. This is a more efficient usage of memory
> > and error handling is quite simple in this case.
> >
> > For this series, I think the memory allocation for ::subpass_flushes
> > should be lumped into the allocation for the entire render pass object.
> > It's simple to do because we know the size upfront and we can avoid
> > having to add the goto infrastructure.
> >
>
> Yes, I debated back and forth with myself over what to do here. There are
> a number of places in the Vulkan driver where we've tried to do a single
> allocation and every time it comes at the cost of having to figure out the
> size up-front (cost both in terms of compute time to do the calculation and
> also in terms of having the code split up). Admittedly, that usually isn't
> a huge deal, but with something as big as subpass, it gets tricky fast.
> When I was working on this, I considered creating a little helper data
> structure for doing these kinds of allocations that would make it
> significantly more reliable. But then I realized that the only place we
> would use it was in the subpass and I didn't bother. I'm willing to revive
> it if you'd like.
>
>
I agree, dealing with the allocations does get tricky with this struct.
Sure, I'd like to see the helper data structure. I'm also okay with
giving ::subpass_flushes the same treatment we're giving ::attachments.
I also thought of creating a helper for these kinds of allocations, but
it turned out to be a function instead of a data structure.
> > > + for (uint32_t i = 0; i < pCreateInfo->dependencyCount; i++) {
> > > + const VkSubpassDependency *dep = &pCreateInfo->pDependencies[i];
> > > + if (dep->dstSubpass != VK_SUBPASS_EXTERNAL) {
> > > + assert(dep->dstSubpass < pass->subpass_count);
> > > + pass->subpass_flushes[dep->dstSubpass] |=
> > > + anv_pipe_invalidate_bits_for_access_flags(dep->
> > dstAccessMask);
> > > + }
> > > +
> > > + if (dep->srcSubpass != VK_SUBPASS_EXTERNAL) {
> > > + assert(dep->srcSubpass < pass->subpass_count);
> > > + pass->subpass_flushes[dep->srcSubpass + 1] |=
> > > + anv_pipe_flush_bits_for_access_flags(dep->srcAccessMask);
> > > + }
> > > + }
> >
> > Why set the flush bits at srcSubpass + 1? This can cause excessive
> > flushing. We can avoid this excess by setting the flush bits at
> > dstSubpass (like we do for the invalidate bits).
> >
>
> How is this excessive? If anything, by doing the flushing earlier, it's
> more likely to get pipelined and not require stalling the GPU.
>
>
I'm no longer opposed to setting the bits at srcSubpass + 1.
I was thinking of the case where you have a render pass with multiple
subpasses that render to color attachments and a single subpass at the
end that textures from these color attachments.
You'd emit flush instructions after each rendering subpass, and one
invalidate for the final subpass. If we flush at dstSubpass, we'd flush
once and then invalidate in the same subpass.
I just took another look at genX(cmd_buffer_apply_pipe_flushes), and I
now see the stalling that flushing at dstSubpass would entail in the
above example. I don't know which would be more expensive, but I'd
assume that stalling would.
> > I see that the implicit dependency with VK_SUBPASS_EXTERNAL is covered
> > below, but explicit dependencies should be covered as well. For example,
> > the above doesn't handle the case where a user declares an external
> > dependency that has VK_ACCESS_SHADER_WRITE_BIT set in the srcSubpass.
> > According to PipelineBarrier, this would require a DATA_CACHE flush.
> >
>
> Right... I thought it was ok when I wrote the code, but you've made me
> question that conviction. How about something like this:
>
> if (dep->dstSubpass == VK_SUBPASS_EXTERNAL) {
> pass->subpass_flushes[pass->subpass_count] |=
> anv_pipe_invalidate_bits_for_access_flags(dep->dstAccessMask);
> } else {
> assert(dep->dstSubpass < pass->subpass_count);
> pass->subpass_flushes[dep->dstSubpass] |=
> anv_pipe_invalidate_bits_for_access_flags(dep->dstAccessMask);
> }
>
Yes, that looks good.
-Nanley
> --Jason
>
>
> > -Nanley
> >
> > > +
> > > + /* From the Vulkan 1.0.39 spec:
> > > + *
> > > + * If there is no subpass dependency from VK_SUBPASS_EXTERNAL to
> > the
> > > + * first subpass that uses an attachment, then an implicit subpass
> > > + * dependency exists from VK_SUBPASS_EXTERNAL to the first
> > subpass it is
> > > + * used in. The subpass dependency operates as if defined with the
> > > + * following parameters:
> > > + *
> > > + * VkSubpassDependency implicitDependency = {
> > > + * .srcSubpass = VK_SUBPASS_EXTERNAL;
> > > + * .dstSubpass = firstSubpass; // First subpass attachment is
> > used in
> > > + * .srcStageMask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
> > > + * .dstStageMask = VK_PIPELINE_STAGE_ALL_COMMANDS_BIT;
> > > + * .srcAccessMask = 0;
> > > + * .dstAccessMask = VK_ACCESS_INPUT_ATTACHMENT_READ_BIT |
> > > + * VK_ACCESS_COLOR_ATTACHMENT_READ_BIT |
> > > + * VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT |
> > > + * VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT
> > |
> > > + * VK_ACCESS_DEPTH_STENCIL_
> > ATTACHMENT_WRITE_BIT;
> > > + * .dependencyFlags = 0;
> > > + * };
> > > + *
> > > + * Similarly, if there is no subpass dependency from the last
> > subpass
> > > + * that uses an attachment to VK_SUBPASS_EXTERNAL, then an
> > implicit
> > > + * subpass dependency exists from the last subpass it is used in
> > to
> > > + * VK_SUBPASS_EXTERNAL. The subpass dependency operates as if
> > defined
> > > + * with the following parameters:
> > > + *
> > > + * VkSubpassDependency implicitDependency = {
> > > + * .srcSubpass = lastSubpass; // Last subpass attachment is
> > used in
> > > + * .dstSubpass = VK_SUBPASS_EXTERNAL;
> > > + * .srcStageMask = VK_PIPELINE_STAGE_ALL_COMMANDS_BIT;
> > > + * .dstStageMask = VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT;
> > > + * .srcAccessMask = VK_ACCESS_INPUT_ATTACHMENT_READ_BIT |
> > > + * VK_ACCESS_COLOR_ATTACHMENT_READ_BIT |
> > > + * VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT |
> > > + * VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT
> > |
> > > + * VK_ACCESS_DEPTH_STENCIL_
> > ATTACHMENT_WRITE_BIT;
> > > + * .dstAccessMask = 0;
> > > + * .dependencyFlags = 0;
> > > + * };
> > > + *
> > > + * We could implement this by walking over all of the attachments and
> > > + * subpasses and checking to see if any of them don't have an
> > external
> > > + * dependency. Or, we could just be lazy and add a couple extra
> > flushes.
> > > + * We choose to be lazy.
> > > + */
> > > + if (has_input) {
> > > + pass->subpass_flushes[0] |=
> > > + ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT;
> > > + }
> > > + if (has_color) {
> > > + pass->subpass_flushes[pass->subpass_count] |=
> > > + ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT;
> > > + }
> > > + if (has_depth) {
> > > + pass->subpass_flushes[pass->subpass_count] |=
> > > + ANV_PIPE_DEPTH_CACHE_FLUSH_BIT;
> > > + }
> > > +
> > > *pRenderPass = anv_render_pass_to_handle(pass);
> > >
> > > return VK_SUCCESS;
> > >
> > > +fail_subpass_attachments:
> > > + vk_free2(&device->alloc, pAllocator, pass->subpass_attachments);
> > > fail_subpass_usages:
> > > vk_free2(&device->alloc, pAllocator, pass->subpass_usages);
> > > fail_pass:
> > > @@ -204,6 +292,7 @@ void anv_DestroyRenderPass(
> > > if (!pass)
> > > return;
> > >
> > > + vk_free2(&device->alloc, pAllocator, pass->subpass_flushes);
> > > vk_free2(&device->alloc, pAllocator, pass->subpass_attachments);
> > > vk_free2(&device->alloc, pAllocator, pass->subpass_usages);
> > > vk_free2(&device->alloc, pAllocator, pass);
> > > diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> > private.h
> > > index a0eefe3..53d39ee 100644
> > > --- a/src/intel/vulkan/anv_private.h
> > > +++ b/src/intel/vulkan/anv_private.h
> > > @@ -2038,6 +2038,8 @@ struct anv_render_pass {
> > > uint32_t attachment_count;
> > > uint32_t subpass_count;
> > > VkAttachmentReference * subpass_attachments;
> > > + /* An array of subpass_count+1 flushes, one per subpass boundary */
> > > + enum anv_pipe_bits * subpass_flushes;
> > > enum anv_subpass_usage * subpass_usages;
> > > struct anv_render_pass_attachment * attachments;
> > > struct anv_subpass subpasses[0];
> > > --
> > > 2.5.0.400.gff86faf
> > >
> > > _______________________________________________
> > > mesa-dev mailing list
> > > mesa-dev at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
More information about the mesa-dev
mailing list