[Mesa-dev] [PATCH v2 17/24] anv: Use blorp_ccs_ambiguate instead of fast-clears

Nanley Chery nanleychery at gmail.com
Wed Jan 31 17:36:54 UTC 2018


On Tue, Jan 30, 2018 at 05:14:39PM -0800, Jason Ekstrand wrote:
> On Tue, Jan 30, 2018 at 5:03 PM, Nanley Chery <nanleychery at gmail.com> wrote:
> 
> > On Tue, Jan 30, 2018 at 04:25:59PM -0800, Jason Ekstrand wrote:
> > > On Tue, Jan 30, 2018 at 2:54 PM, Nanley Chery <nanleychery at gmail.com>
> > wrote:
> > >
> > > > On Fri, Jan 19, 2018 at 03:47:34PM -0800, Jason Ekstrand wrote:
> > > > > Even though the blorp pass looks a bit on the sketchy side, the end
> > > > > result in the Vulkan driver is very nice.  Instead of having this
> > weird
> > > > > case where you do a fast clear and then maybe have to resolve, we
> > just
> > > > > do the ambiguate and are done with it.  The ambiguate does exactly
> > what
> > > > > we want of setting all the CCS values to 0 which puts it inot the
> > > >                                                            ^
> > > >                                                            in
> >
> > Typo.
> >
> 
> Yup.  Meant into
> 
> 
> > > > > pass-through state.
> > > > >
> > > > > This should also improve performance a bit in certain cases.  For
> > > > > instance, if we did a transition from UNDEFINED to GENERAL for a
> > surface
> > > > > that doesn't have CCS enabled all the time, we would end up doing a
> > > > > fast-clear and then a full resolve which ends up touching every byte
> > in
> > > > > the main surface as well as the CCS.  With the ambiguate pass, that
> > > > > transition only touches the CCS.
> > > > > ---
> > > > >  src/intel/vulkan/anv_blorp.c       |  5 ++++
> > > > >  src/intel/vulkan/genX_cmd_buffer.c | 54
> > +++++++++---------------------
> > > > --------
> > > > >  2 files changed, 17 insertions(+), 42 deletions(-)
> > > > >
> > > > > diff --git a/src/intel/vulkan/anv_blorp.c
> > b/src/intel/vulkan/anv_blorp.c
> > > > > index 05efc6d..3698543 100644
> > > > > --- a/src/intel/vulkan/anv_blorp.c
> > > > > +++ b/src/intel/vulkan/anv_blorp.c
> > > > > @@ -1792,6 +1792,11 @@ anv_image_ccs_op(struct anv_cmd_buffer
> > > > *cmd_buffer,
> > > > >                          surf.surf->format,
> > isl_to_blorp_fast_clear_op(
> > > > ccs_op));
> > > > >        break;
> > > > >     case ISL_AUX_OP_AMBIGUATE:
> > > > > +      for (uint32_t a = 0; a < layer_count; a++) {
> > > > > +         const uint32_t layer = base_layer + a;
> > > > > +         blorp_ccs_ambiguate(&batch, &surf, level, layer);
> > > > > +      }
> > > > > +      break;
> > > > >     default:
> > > > >        unreachable("Unsupported CCS operation");
> > > > >     }
> > > > > diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> > > > b/src/intel/vulkan/genX_cmd_buffer.c
> > > > > index 77fdadf..9e2eba3 100644
> > > > > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > > > > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > > > > @@ -486,15 +486,6 @@ init_fast_clear_state_entry(struct
> > anv_cmd_buffer
> > > > *cmd_buffer,
> > > > >     uint32_t plane = anv_image_aspect_to_plane(image->aspects,
> > aspect);
> > > > >     enum isl_aux_usage aux_usage = image->planes[plane].aux_usage;
> > > > >
> > > > > -   /* The resolve flag should updated to signify that
> > > > fast-clear/compression
> > > > > -    * data needs to be removed when leaving the undefined layout.
> > Such
> > > > data
> > > > > -    * may need to be removed if it would cause accesses to the color
> > > > buffer
> > > > > -    * to return incorrect data. The fast clear data in CCS_D buffers
> > > > should
> > > > > -    * be removed because CCS_D isn't enabled all the time.
> > > > > -    */
> > > > > -   genX(set_image_needs_resolve)(cmd_buffer, image, aspect, level,
> > > > > -                                 aux_usage == ISL_AUX_USAGE_NONE);
> > > > > -
> > > > >     /* The fast clear value dword(s) will be copied into a surface
> > state
> > > > object.
> > > > >      * Ensure that the restrictions of the fields in the dword(s) are
> > > > followed.
> > > > >      *
> > > > > @@ -677,10 +668,9 @@ transition_color_buffer(struct anv_cmd_buffer
> > > > *cmd_buffer,
> > > > >        for (unsigned level = base_level; level < last_level_num;
> > level++)
> > > > >           init_fast_clear_state_entry(cmd_buffer, image, aspect,
> > level);
> > > > >
> > > > > -      /* Initialize the aux buffers to enable correct rendering.
> > This
> > > > operation
> > > > > -       * requires up to two steps: one to rid the aux buffer of data
> > > > that may
> > > > > -       * cause GPU hangs, and another to ensure that writes done
> > > > without aux
> > > > > -       * will be visible to reads done with aux.
> > > > > +      /* Initialize the aux buffers to enable correct rendering.  In
> > > > order to
> > > > > +       * ensure that things such as storage images work correctly,
> > aux
> > > > buffers
> > > > > +       * are initialized to the pass-through state.
> > > >
> > > > Only CCS is initialized to the pass-through state while MCS is
> > > > fast-cleared. We may also want to update the comment below since we're
> > > > no longer fast-clearing CCS.
> > > >
> > >
> > > Right.  I've replaced this entire comment with:
> > >
> > >       /* Initialize the aux buffers to enable correct rendering.  In
> > order
> > > to
> > >        * ensure that things such as storage images work correctly, aux
> > > buffers
> > >        * need to be initialized to valid data.
> > >        *
> > >        * Having an aux buffer with invalid data is a problem for two
> > > reasons:
> > >        *
> > >        *  1) Having an invalid value in the buffer can confuse the
> > hardware.
> > >        *     For instance, with CCS_E on SKL, a two-bit CCS value of 2 is
> > >        *     invalid and leads to the hardware doing strange things.  It
> > >        *     doesn't hang as far as we can tell but rendering corruption
> > can
> > >        *     occur.
> > >        *
> > >        *  2) If this transition is into the GENERAL layout and we then
> > use
> > > the
> > >        *     image as a storage image, then we must have the aux buffer
> > in
> > > the
> > >        *     pass-through state so that, if we then go to texture from
> > the
> > >        *     image, we get the results of our storage image writes and
> > not
> > > the
> > >        *     fast clear color or other random data.
> >
> > This reads better. Is 2) an actual issue? When we create surface states
> > for storage images, we set the aux_usage to NONE.
> >
> 
> Yes.  If you do UNDEFINED -> GENERAL -> use for storage -> RENDER_OPTIMAL +
> LOAD_OP_LOAD -> draw to part of it -> TEXTURE_OPTIMAL.
> 
> 

Got it. With these updates, this patch is
Reviewed-by: Nanley Chery <nanley.g.chery at intel.com>

> > - Nanley
> >
> > >        *
> > >        * For CCS both of the problems above are real demonstrable issues.
> > > In
> > >        * that case, the only thing we can do is to perform an ambiguate
> > to
> > >        * transition the aux surface into the pass-through state.
> > >        *
> > >        * For MCS, (2) is never an issue because we don't support
> > > multisampled
> > >        * storage images.  In theory, issue (1) is a problem with MCS but
> > > we've
> > >        * never seen it in the wild.  For 4x and 16x, all bit patters
> > could,
> > > in
> > >        * theory, be interpreted as something but we don't know that all
> > bit
> > >        * patterns are actually valid.  For 2x and 8x, you could easily
> > end
> > > up
> > >        * with the MCS referring to an invalid plane because not all bits
> > of
> > >        * the MCS value are actually used.  Even though we've never seen
> > > issues
> > >        * in the wild, it's best to play it safe and initialize the MCS.
> > We
> > >        * can use a fast-clear for MCS because we only ever touch from
> > render
> > >        * and texture (no image load store).
> > >        */
> > >
> > > >         *
> > > > >         * Having an aux buffer with invalid data is possible for CCS
> > > > buffers
> > > > >         * SKL+ and for MCS buffers with certain sample counts (2x and
> > > > 8x). One
> > > > > @@ -692,16 +682,18 @@ transition_color_buffer(struct anv_cmd_buffer
> > > > *cmd_buffer,
> > > > >         * We don't have any data to show that this is a problem, but
> > we
> > > > want to
> > > > >         * avoid causing difficult-to-debug problems.
> > > > >         */
> > > > > -      if (GEN_GEN >= 9 && image->samples == 1) {
> > > > > +      if (image->samples == 1) {
> > > > >           for (uint32_t l = 0; l < level_count; l++) {
> > > > >              const uint32_t level = base_level + l;
> > > > >              const uint32_t level_layer_count =
> > > > >                 MIN2(layer_count, anv_image_aux_layers(image, aspect,
> > > > level));
> > > > >              anv_image_ccs_op(cmd_buffer, image, aspect, level,
> > > > >                               base_layer, level_layer_count,
> > > > > -                             ISL_AUX_OP_FAST_CLEAR, false);
> > > > > +                             ISL_AUX_OP_AMBIGUATE, false);
> > > > > +            genX(set_image_needs_resolve)(cmd_buffer, image,
> > > > > +                                          aspect, level, false);
> > > > >           }
> > > > > -      } else if (image->samples > 1) {
> > > > > +      } else {
> > > > >           if (image->samples == 4 || image->samples == 16) {
> > > > >              anv_perf_warn(cmd_buffer->device->instance, image,
> > > > >                            "Doing a potentially unnecessary
> > fast-clear
> > > > to "
> > > > > @@ -712,32 +704,10 @@ transition_color_buffer(struct anv_cmd_buffer
> > > > *cmd_buffer,
> > > > >           anv_image_mcs_op(cmd_buffer, image, aspect,
> > > > >                            base_layer, layer_count,
> > > > >                            ISL_AUX_OP_FAST_CLEAR, false);
> > > > > -      }
> > > > > -      /* At this point, some elements of the CCS buffer may have the
> > > > fast-clear
> > > > > -       * bit-arrangement. As the user writes to a subresource, we
> > need
> > > > to have
> > > > > -       * the associated CCS elements enter the ambiguated state.
> > This
> > > > enables
> > > > > -       * reads (implicit or explicit) to reflect the user-written
> > data
> > > > instead
> > > > > -       * of the clear color. The only time such elements will not
> > > > change their
> > > > > -       * state as described above, is in a final layout that doesn't
> > > > have CCS
> > > > > -       * enabled. In this case, we must force the associated CCS
> > > > buffers of the
> > > > > -       * specified range to enter the ambiguated state in advance.
> > > > > -       */
> > > > > -      if (image->samples == 1 &&
> > > > > -          image->planes[plane].aux_usage != ISL_AUX_USAGE_CCS_E &&
> > > > > -          final_layout != VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL)
> > {
> > > > > -         /* The CCS_D buffer may not be enabled in the final layout.
> > > > Call this
> > > > > -          * function again with a initial layout of
> > > > COLOR_ATTACHMENT_OPTIMAL
> > > > > -          * to perform a resolve.
> > > > > -          */
> > > > > -          anv_perf_warn(cmd_buffer->device->instance, image,
> > > > > -                        "Performing an additional resolve for CCS_D
> > > > layout "
> > > > > -                        "transition. Consider always leaving it on
> > or "
> > > > > -                        "performing an ambiguation pass.");
> > > > > -         transition_color_buffer(cmd_buffer, image, aspect,
> > > > > -                                 base_level, level_count,
> > > > > -                                 base_layer, layer_count,
> > > > > -                                 VK_IMAGE_LAYOUT_COLOR_
> > > > ATTACHMENT_OPTIMAL,
> > > > > -                                 final_layout);
> > > >
> > > > Nice to be rid of this.
> > > >
> > >
> > > Yup.
> > >
> > >
> > > > > +         for (unsigned level = base_level; level < last_level_num;
> > > > level++) {
> > > > > +            genX(set_image_needs_resolve)(cmd_buffer, image,
> > > > > +                                          aspect, level, true);
> > > > > +         }
> > > >
> > > > Why set needs resolve true here?
> > > >
> > >
> > > There's no particularly good reason.  I'll delete it.  We never do MCS
> > > resolves anyway.  When the time comes for me to go hook up MCS
> > fast-clears
> > > again, we can sort it out then.
> >


More information about the mesa-dev mailing list