[Mesa-dev] [PATCH 22/22] anv: Predicate fast-clear resolves

Nanley Chery nanleychery at gmail.com
Wed May 10 23:42:27 UTC 2017


On Wed, May 03, 2017 at 03:39:51PM -0700, Jason Ekstrand wrote:
> On Thu, Apr 27, 2017 at 11:32 AM, Nanley Chery <nanleychery at gmail.com>
> wrote:
> 
> > There's no image layout to represent a full-RT-cleared color attachment.
> > That's one reason we can end up with redundant resolves. Testing has
> > shown that such resolves can measurably hurt performance and that
> > predicating them can avoid the penalty.
> >
> > Signed-off-by: Nanley Chery <nanley.g.chery at intel.com>
> > ---
> >  src/intel/vulkan/anv_blorp.c       |   3 +-
> >  src/intel/vulkan/anv_image.c       |   6 +++
> >  src/intel/vulkan/genX_cmd_buffer.c | 106 ++++++++++++++++++++++++++++++
> > ++++---
> >  3 files changed, 108 insertions(+), 7 deletions(-)
> >
> > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> > index 821d01a077..32f0edf316 100644
> > --- a/src/intel/vulkan/anv_blorp.c
> > +++ b/src/intel/vulkan/anv_blorp.c
> > @@ -1500,7 +1500,8 @@ anv_ccs_resolve(struct anv_cmd_buffer * const
> > cmd_buffer,
> >
> >     struct blorp_batch batch;
> >     blorp_batch_init(&cmd_buffer->device->blorp, &batch, cmd_buffer,
> > -                    BLORP_BATCH_NO_EMIT_DEPTH_STENCIL);
> > +                    BLORP_BATCH_NO_EMIT_DEPTH_STENCIL |
> > +                    BLORP_BATCH_PREDICATE_ENABLE);
> >
> >     struct blorp_surf surf;
> >     get_blorp_surf_for_anv_image(image, VK_IMAGE_ASPECT_COLOR_BIT,
> > diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
> > index 751f2d6026..92ee86dab5 100644
> > --- a/src/intel/vulkan/anv_image.c
> > +++ b/src/intel/vulkan/anv_image.c
> > @@ -208,6 +208,12 @@ anv_fill_ccs_resolve_ss(const struct anv_device *
> > const device,
> >                         .aux_usage = image->aux_usage ==
> > ISL_AUX_USAGE_NONE ?
> >                                      ISL_AUX_USAGE_CCS_D :
> > image->aux_usage,
> >                         .mocs = device->default_mocs);
> > +
> > +   /* The following dword is used as a flag to represent whether or not
> > this
> > +    * CCS subresource needs resolving. We want this to be zero by default,
> > +    * which means that a resolve is not necessary.
> > +    */
> > +   assert(*(uint32_t *)(data + device->isl_dev.ss.addr_offset) == 0);
> >
> 
> This seems like kind-of an odd place to put it... Not a problem, just odd.
> 
> 

I agree. It's logically better off in anv_BindImage. It's mainly here
for convenience (the data pointer is already calculated for us).

> >  }
> >
> >  /**
> > diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> > b/src/intel/vulkan/genX_cmd_buffer.c
> > index 95729ec8a8..041301290e 100644
> > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > @@ -402,6 +402,82 @@ transition_depth_buffer(struct anv_cmd_buffer
> > *cmd_buffer,
> >        anv_gen8_hiz_op_resolve(cmd_buffer, image, hiz_op);
> >  }
> >
> > +static uint32_t
> > +clear_value_buffer_offset(const struct anv_cmd_buffer * const cmd_buffer,
> > +                          const struct anv_image * const image,
> > +                          const uint8_t level)
> > +{
> > +   return image->offset +
> > +          image->aux_surface.offset + image->aux_surface.isl.size +
> > +          cmd_buffer->device->isl_dev.ss.size * level;
> > +}
> > +
> > +#define MI_PREDICATE_SRC0  0x2400
> > +#define MI_PREDICATE_SRC1  0x2408
> > +
> > +enum ccs_resolve_state {
> > +   CCS_RESOLVE_NOT_NEEDED,
> > +   CCS_RESOLVE_NEEDED,
> > +   CCS_RESOLVE_STARTING,
> > +};
> > +
> > +/* Manages the state of an color image subresource to ensure resolves are
> > + * performed properly.
> > + */
> > +static void
> > +genX(set_resolve_state)(struct anv_cmd_buffer * const cmd_buffer,
> > +                        const struct anv_image * const image,
> > +                        const uint8_t level,
> > +                        const enum ccs_resolve_state state)
> > +{
> > +   assert(cmd_buffer && image);
> > +
> > +   /* The image subresource range must have a color auxiliary buffer. */
> > +   assert(anv_image_has_color_aux(image));
> > +   assert(level < anv_color_aux_levels(image));
> > +
> > +   /* We store the resolve flag in the location of the surface base
> > address
> > +    * field for simplicity and because it is initialized to zero when the
> > +    * clear value buffer is initialized.
> > +    */
> > +   const uint32_t resolve_flag_offset =
> > +      clear_value_buffer_offset(cmd_buffer, image, level) +
> > +      cmd_buffer->device->isl_dev.ss.addr_offset;
> > +
> > +   /* An update operation should not overwrite the fast clear value. */
> > +   if (cmd_buffer->device->isl_dev.ss.addr_offset <
> > +       cmd_buffer->device->isl_dev.ss.clear_value_offset) {
> > +      assert(cmd_buffer->device->isl_dev.ss.addr_offset + 4 <=
> > +             cmd_buffer->device->isl_dev.ss.clear_value_offset);
> > +   }
> > +
> > +   if (state != CCS_RESOLVE_STARTING) {
> > +      assert(state == CCS_RESOLVE_NEEDED || state ==
> > CCS_RESOLVE_NOT_NEEDED);
> > +      /* The HW docs say that there is no way to guarantee the completion
> > of
> > +       * the following command. We use it nevertheless because it shows no
> > +       * issues in testing is currently being used in the GL driver.
> > +       */
> > +      anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_DATA_IMM), sdi) {
> > +         sdi.Address = (struct anv_address) { image->bo,
> > resolve_flag_offset };
> > +         sdi.ImmediateData = state == CCS_RESOLVE_NEEDED;
> > +      }
> > +   } else {
> > +      /* Make the pending predicated resolve a no-op if one is not needed.
> > +       * predicate = do_resolve = resolve_flag != 0;
> > +       */
> > +      emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1    , 0);
> > +      emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1 + 4, 0);
> > +      emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0    , 0);
> > +      emit_lrm(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4,
> > +               image->bo, resolve_flag_offset);
> > +      anv_batch_emit(&cmd_buffer->batch, GENX(MI_PREDICATE), mip) {
> > +         mip.LoadOperation    = LOAD_LOADINV;
> > +         mip.CombineOperation = COMBINE_SET;
> > +         mip.CompareOperation = COMPARE_SRCS_EQUAL;
> > +      }
> >
> 
> I really wish the MI_PREDICATE were closer to the anv_ccs_resolve.  Maybe
> we could just inline this function?  If we had a helper to get the
> anv_address of the resolve flag, it wouldn't be bad at all.
> 
> 

I also wanted it to be closer. I came up with the current implementation
to work around the anv_blorp / per-gen-compilation divide. What do you
have in mind for inlining? I tried to think something up but it doesn't
look very good.


More information about the mesa-dev mailing list