[Mesa-dev] [PATCH 13/29] anv/cmd_buffer: Rework aux tracking
Jason Ekstrand
jason at jlekstrand.net
Sun Jan 14 05:51:34 UTC 2018
On Fri, Jan 5, 2018 at 4:34 PM, Nanley Chery <nanleychery at gmail.com> wrote:
> On Mon, Nov 27, 2017 at 07:06:03PM -0800, Jason Ekstrand wrote:
> > This makes us start tracking two bits per level for aux to describe both
> > whether or not something is fast-cleared and whether or not it is
> > compressed as opposed to a single "needs resolve" bit. The previous
> > scheme only worked because we assumed that CCS_E compressed images would
> > always be read with CCS_E enabled and so they didn't need any sort of
> > resolve if there was no fast clear color.
> >
> > The assumptions of the previous scheme held because the one case where
> > we do need a full resolve when CCS_E is enabled is for window-system
> > images. Since we only ever allowed X-tiled window-system images, CCS
> > was entirely disabled on gen9+ and we never got CCS_E. With the advent
> > of Y-tiled window-system buffers, we now need to properly support doing
> > a full resolve images marked CCS_E. This requires us to track things
> > more granularly because, if the client does two back-to-back transitions
> > where we first do a partial resolve and then a full resolve, we need
> > both resolves to happen.
>
> This commit message says that this patch is necessary to do a full
> resolve with CCS_E, why is this so? Weren't the requirements fulfilled
> with the "anv/cmd_buffer: Generalize transition_color_buffer" patch
> (and my comment about disabling predication applied)?
>
Yes, we could just disable predication for full resolves on CCS_E but is
that going to lead to extra resolves? Also, as per comments about clear
colors, this is going to have to be revised to force a resolve even if the
clear color is zero. In particular, if we have CCS_E with a zero clear
color and we're transitioning it to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR and the
image has I915_FORMAT_MOD_Y_TILED_CCS then we need to force a partial
resolve. It's kind-of a mess and I don't see a good way to do it without
either tracking it all or having a bunch of redundant resolves.
> -Nanley
>
> > ---
> > src/intel/vulkan/anv_private.h | 10 +--
> > src/intel/vulkan/genX_cmd_buffer.c | 158 ++++++++++++++++++++++++++++++
> -------
> > 2 files changed, 135 insertions(+), 33 deletions(-)
> >
> > diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> private.h
> > index f805246..e875705 100644
> > --- a/src/intel/vulkan/anv_private.h
> > +++ b/src/intel/vulkan/anv_private.h
> > @@ -2476,16 +2476,16 @@ anv_fast_clear_state_entry_size(const struct
> anv_device *device)
> > {
> > assert(device);
> > /* Entry contents:
> > - * +--------------------------------------------+
> > - * | clear value dword(s) | needs resolve dword |
> > - * +--------------------------------------------+
> > + * +---------------------------------------------+
> > + * | clear value dword(s) | needs resolve dwords |
> > + * +---------------------------------------------+
> > */
> >
> > - /* Ensure that the needs resolve dword is in fact dword-aligned to
> enable
> > + /* Ensure that the needs resolve dwords are in fact dword-aligned to
> enable
> > * GPU memcpy operations.
> > */
> > assert(device->isl_dev.ss.clear_value_size % 4 == 0);
> > - return device->isl_dev.ss.clear_value_size + 4;
> > + return device->isl_dev.ss.clear_value_size + 8;
> > }
> >
> > static inline struct anv_address
> > diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> > index 6aeffa3..2d47179 100644
> > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > @@ -404,6 +404,22 @@ transition_depth_buffer(struct anv_cmd_buffer
> *cmd_buffer,
> > 0, 0, 1, hiz_op);
> > }
> >
> > +/** Bitfield representing the needed resolves.
> > + *
> > + * This bitfield represents the kinds of compression we may or may not
> have in
> > + * a given image. The ANV_IMAGE_NEEDS_* can be ANDed with the
> ANV_IMAGE_HAS
> > + * bits to determine when a given resolve actually needs to happen.
> > + *
> > + * For convenience of dealing with MI_PREDICATE, we blow these two bits
> out
> > + * into two dwords in the image meatadata page.
> > + */
> > +enum anv_image_resolve_bits {
> > + ANV_IMAGE_HAS_FAST_CLEAR_BIT = 0x1,
> > + ANV_IMAGE_HAS_COMPRESSION_BIT = 0x2,
> > + ANV_IMAGE_NEEDS_PARTIAL_RESOLVE_BITS = 0x1,
> > + ANV_IMAGE_NEEDS_FULL_RESOLVE_BITS = 0x3,
> > +};
> > +
> > #define MI_PREDICATE_SRC0 0x2400
> > #define MI_PREDICATE_SRC1 0x2408
> >
> > @@ -411,23 +427,62 @@ transition_depth_buffer(struct anv_cmd_buffer
> *cmd_buffer,
> > * performed properly.
> > */
> > static void
> > -set_image_needs_resolve(struct anv_cmd_buffer *cmd_buffer,
> > - const struct anv_image *image,
> > - VkImageAspectFlagBits aspect,
> > - unsigned level, bool needs_resolve)
> > +set_image_needs_resolve_bits(struct anv_cmd_buffer *cmd_buffer,
> > + const struct anv_image *image,
> > + VkImageAspectFlagBits aspect,
> > + unsigned level,
> > + enum anv_image_resolve_bits set_bits)
> > {
> > assert(cmd_buffer && image);
> > assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV);
> > assert(level < anv_image_aux_levels(image, aspect));
> >
> > - /* The HW docs say that there is no way to guarantee the completion
> of
> > - * the following command. We use it nevertheless because it shows no
> > - * issues in testing is currently being used in the GL driver.
> > - */
> > - anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_DATA_IMM), sdi) {
> > - sdi.Address = anv_image_get_needs_resolve_
> addr(cmd_buffer->device,
> > - image, aspect,
> level);
> > - sdi.ImmediateData = needs_resolve;
> > + const struct anv_address resolve_flag_addr =
> > + anv_image_get_needs_resolve_addr(cmd_buffer->device,
> > + image, aspect, level);
> > +
> > + if (set_bits & ANV_IMAGE_HAS_FAST_CLEAR_BIT) {
> > + anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_DATA_IMM), sdi)
> {
> > + sdi.Address = resolve_flag_addr;
> > + sdi.ImmediateData = 1;
> > + }
> > + }
> > + if (set_bits & ANV_IMAGE_HAS_COMPRESSION_BIT) {
> > + anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_DATA_IMM), sdi)
> {
> > + sdi.Address = resolve_flag_addr;
> > + sdi.Address.offset += 4;
> > + sdi.ImmediateData = 1;
> > + }
> > + }
> > +}
> > +
> > +static void
> > +clear_image_needs_resolve_bits(struct anv_cmd_buffer *cmd_buffer,
> > + const struct anv_image *image,
> > + VkImageAspectFlagBits aspect,
> > + unsigned level,
> > + enum anv_image_resolve_bits clear_bits)
> > +{
> > + assert(cmd_buffer && image);
> > + assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV);
> > + assert(level < anv_image_aux_levels(image, aspect));
> > +
> > + const struct anv_address resolve_flag_addr =
> > + anv_image_get_needs_resolve_addr(cmd_buffer->device,
> > + image, aspect, level);
> > +
> > + if (clear_bits & ANV_IMAGE_HAS_FAST_CLEAR_BIT) {
> > + anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_DATA_IMM), sdi)
> {
> > + sdi.Address = resolve_flag_addr;
> > + sdi.ImmediateData = 0;
> > + }
> > + }
> > + if (clear_bits & ANV_IMAGE_HAS_COMPRESSION_BIT) {
> > + anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_DATA_IMM), sdi)
> {
> > + sdi.Address = resolve_flag_addr;
> > + sdi.Address.offset += 4;
> > + sdi.ImmediateData = 0;
> > + }
> > }
> > }
> >
> > @@ -435,7 +490,8 @@ static void
> > load_needs_resolve_predicate(struct anv_cmd_buffer *cmd_buffer,
> > const struct anv_image *image,
> > VkImageAspectFlagBits aspect,
> > - unsigned level)
> > + unsigned level,
> > + enum anv_image_resolve_bits bits)
> > {
> > assert(cmd_buffer && image);
> > assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV);
> > @@ -450,9 +506,27 @@ load_needs_resolve_predicate(struct anv_cmd_buffer
> *cmd_buffer,
> > */
> > emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1 , 0);
> > emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1 + 4, 0);
> > - emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0 , 0);
> > - emit_lrm(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4,
> > - resolve_flag_addr.bo, resolve_flag_addr.offset);
> > +
> > + /* Conditionally load the two dwords into the high and low portions
> of
> > + * MI_PREDICATE_SRC0. This effectively ANDs the bits passed into
> this
> > + * function with the logical bits stored in the metadata page.
> Because
> > + * they're split out with one bit per dword, we don't need to use any
> > + * sort of MI math.
> > + */
> > + if (bits & ANV_IMAGE_HAS_FAST_CLEAR_BIT) {
> > + emit_lrm(&cmd_buffer->batch, MI_PREDICATE_SRC0,
> > + resolve_flag_addr.bo, resolve_flag_addr.offset);
> > + } else {
> > + emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0, 0);
> > + }
> > +
> > + if (bits & ANV_IMAGE_HAS_COMPRESSION_BIT) {
> > + emit_lrm(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4,
> > + resolve_flag_addr.bo, resolve_flag_addr.offset + 4);
> > + } else {
> > + emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4, 0);
> > + }
> > +
> > anv_batch_emit(&cmd_buffer->batch, GENX(MI_PREDICATE), mip) {
> > mip.LoadOperation = LOAD_LOADINV;
> > mip.CombineOperation = COMBINE_SET;
> > @@ -467,6 +541,18 @@ genX(cmd_buffer_mark_image_written)(struct
> anv_cmd_buffer *cmd_buffer,
> > enum isl_aux_usage aux_usage,
> > unsigned level)
> > {
> > + /* The only compression types with more than just fast-clears are
> MCS,
> > + * CCS_E, and HiZ. With HiZ we just trust the layout and don't
> actually
> > + * track the current fast-clear and compression state. This leaves
> us
> > + * with just MCS and CCS_E.
> > + */
> > + if (aux_usage != ISL_AUX_USAGE_CCS_E &&
> > + aux_usage != ISL_AUX_USAGE_MCS)
> > + return;
> > +
> > + set_image_needs_resolve_bits(cmd_buffer, image,
> > + VK_IMAGE_ASPECT_COLOR_BIT, level,
> > + ANV_IMAGE_HAS_COMPRESSION_BIT);
> > }
> >
> > static void
> > @@ -488,8 +574,11 @@ init_fast_clear_state_entry(struct anv_cmd_buffer
> *cmd_buffer,
> > * to return incorrect data. The fast clear data in CCS_D buffers
> should
> > * be removed because CCS_D isn't enabled all the time.
> > */
> > - set_image_needs_resolve(cmd_buffer, image, aspect, level,
> > - aux_usage == ISL_AUX_USAGE_NONE);
> > + if (aux_usage == ISL_AUX_USAGE_NONE) {
> > + set_image_needs_resolve_bits(cmd_buffer, image, aspect, level,
> ~0);
> > + } else {
> > + clear_image_needs_resolve_bits(cmd_buffer, image, aspect, level,
> ~0);
> > + }
> >
> > /* The fast clear value dword(s) will be copied into a surface state
> object.
> > * Ensure that the restrictions of the fields in the dword(s) are
> followed.
> > @@ -812,12 +901,25 @@ transition_color_buffer(struct anv_cmd_buffer
> *cmd_buffer,
> > layer_count = MIN2(layer_count, anv_image_aux_layers(image,
> aspect, level));
> > }
> >
> > - load_needs_resolve_predicate(cmd_buffer, image, aspect, level);
> > + enum anv_image_resolve_bits resolve_bits;
> > + switch (resolve_op) {
> > + case ISL_AUX_OP_FULL_RESOLVE:
> > + resolve_bits = ANV_IMAGE_NEEDS_FULL_RESOLVE_BITS;
> > + break;
> > + case ISL_AUX_OP_PARTIAL_RESOLVE:
> > + resolve_bits = ANV_IMAGE_NEEDS_PARTIAL_RESOLVE_BITS;
> > + break;
> > + default:
> > + unreachable("Invalid resolve op");
> > + }
> > + load_needs_resolve_predicate(cmd_buffer, image, aspect, level,
> > + resolve_bits);
> >
> > anv_image_ccs_op(cmd_buffer, image, aspect, level,
> > base_layer, layer_count, resolve_op, true);
> >
> > - set_image_needs_resolve(cmd_buffer, image, aspect, level, false);
> > + clear_image_needs_resolve_bits(cmd_buffer, image, aspect,
> > + level, resolve_bits);
> > }
> >
> > cmd_buffer->state.pending_pipe_bits |=
> > @@ -2992,15 +3094,15 @@ cmd_buffer_subpass_sync_fast_clear_values(struct
> anv_cmd_buffer *cmd_buffer)
> > * will match what's in every RENDER_SURFACE_STATE object
> when it's
> > * being used for sampling.
> > */
> > - set_image_needs_resolve(cmd_buffer, iview->image,
> > - VK_IMAGE_ASPECT_COLOR_BIT,
> > - iview->planes[0].isl.base_level,
> > - false);
> > + clear_image_needs_resolve_bits(cmd_buffer, iview->image,
> > + VK_IMAGE_ASPECT_COLOR_BIT,
> > + iview->planes[0].isl.base_
> level,
> > +
> ANV_IMAGE_HAS_FAST_CLEAR_BIT);
> > } else {
> > - set_image_needs_resolve(cmd_buffer, iview->image,
> > - VK_IMAGE_ASPECT_COLOR_BIT,
> > - iview->planes[0].isl.base_level,
> > - true);
> > + set_image_needs_resolve_bits(cmd_buffer, iview->image,
> > + VK_IMAGE_ASPECT_COLOR_BIT,
> > + iview->planes[0].isl.base_
> level,
> > + ANV_IMAGE_HAS_FAST_CLEAR_BIT);
> > }
> > } else if (rp_att->load_op == VK_ATTACHMENT_LOAD_OP_LOAD) {
> > /* The attachment may have been fast-cleared in a previous
> render
> > --
> > 2.5.0.400.gff86faf
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180113/16d7e49e/attachment-0001.html>
More information about the mesa-dev
mailing list