<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Feb 5, 2018 at 5:41 PM, Nanley Chery <span dir="ltr"><<a href="mailto:nanleychery@gmail.com" target="_blank">nanleychery@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">On Fri, Jan 19, 2018 at 03:47:37PM -0800, Jason Ekstrand wrote:<br>
</span><div><div class="gmail-h5">> This commit completely reworks aux tracking.  This includes a number of<br>
> somewhat distinct changes:<br>
><br>
>  1) Since we are no longer fast-clearing multiple slices, we only need<br>
>     to track one fast clear color and one fast clear type.<br>
><br>
>  2) We store two bits for fast clear instead of one to let us<br>
>     distinguish between zero and non-zero fast clear colors.  This is<br>
>     needed so that we can do full resolves when transitioning to<br>
>     PRESENT_SRC_KHR with gen9 CCS images where we allow zero clear<br>
>     values in all sorts of places wouldn't normally.<br>
><br>
>  3) We now track compression state as a boolean separate from fast clear<br>
>     type and this is tracked on a per-slice granularity.<br>
><br>
> The previous scheme had some issues when it came to individual slices of<br>
> a multi-LOD images.  In particular, we only tracked "needs resolve"<br>
> per-LOD but you could do a vkCmdPipelineBarrier that would only resolve<br>
> a portion of the image and would set "needs resolve" to false anyway.<br>
> Also, any transition from an undefined layout would reset the clear<br>
> color for the entire LOD regardless of whether or not there was some<br>
> clear color on some other slice.<br>
><br>
> As far as full/partial resolves go, he assumptions of the previous<br>
> scheme held because the one case where we do need a full resolve when<br>
> CCS_E is enabled is for window-system images.  Since we only ever<br>
> allowed X-tiled window-system images, CCS was entirely disabled on gen9+<br>
> and we never got CCS_E.  With the advent of Y-tiled window-system<br>
> buffers, we now need to properly support doing a full resolve of images<br>
> marked CCS_E.<br>
> ---<br>
>  src/intel/vulkan/anv_blorp.c       |   3 +-<br>
>  src/intel/vulkan/anv_image.c       |  96 ++++++-----<br>
>  src/intel/vulkan/anv_private.h     |  53 +++---<br>
>  src/intel/vulkan/genX_cmd_<wbr>buffer.c | 340 +++++++++++++++++++++++++++---<wbr>-------<br>
>  4 files changed, 331 insertions(+), 161 deletions(-)<br>
><br>
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c<br>
> index 3698543..594b0d8 100644<br>
> --- a/src/intel/vulkan/anv_blorp.c<br>
> +++ b/src/intel/vulkan/anv_blorp.c<br>
> @@ -1757,8 +1757,7 @@ anv_image_ccs_op(struct anv_cmd_buffer *cmd_buffer,<br>
>         * particular value and don't care about format or clear value.<br>
>         */<br>
>        const struct anv_address clear_color_addr =<br>
> -         anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image,<br>
> -                                        aspect, level);<br>
> +         anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image, aspect);<br>
>        surf.clear_color_addr = anv_to_blorp_address(clear_<wbr>color_addr);<br>
>     }<br>
><br>
> diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c<br>
> index 94b9ecb..d5f8dcf 100644<br>
> --- a/src/intel/vulkan/anv_image.c<br>
> +++ b/src/intel/vulkan/anv_image.c<br>
> @@ -190,46 +190,54 @@ all_formats_ccs_e_compatible(<wbr>const struct gen_device_info *devinfo,<br>
>   * fast-clear values in non-trivial cases (e.g., outside of a render pass in<br>
>   * which a fast clear has occurred).<br>
>   *<br>
> - * For the purpose of discoverability, the algorithm used to manage this buffer<br>
> - * is described here. A clear value in this buffer is updated when a fast clear<br>
> - * is performed on a subresource. One of two synchronization operations is<br>
> - * performed in order for a following memory access to use the fast-clear<br>
> - * value:<br>
> - *    a. Copy the value from the buffer to the surface state object used for<br>
> - *       reading. This is done implicitly when the value is the clear value<br>
> - *       predetermined to be the default in other surface state objects. This<br>
> - *       is currently only done explicitly for the operation below.<br>
> - *    b. Do (a) and use the surface state object to resolve the subresource.<br>
> - *       This is only done during layout transitions for decent performance.<br>
> + * In order to avoid having multiple clear colors for a single plane of an<br>
> + * image (hence a single RENDER_SURFACE_STATE), we only allow fast-clears on<br>
> + * the first slice (level 0, layer 0).  At the time of our testing (Jan 17,<br>
> + * 2018), there were known applications which would benefit from fast-clearing<br>
> + * more than just the first slice.<br>
>   *<br>
> - * With the above scheme, we can fast-clear whenever the hardware allows except<br>
> - * for two cases in which synchronization becomes impossible or undesirable:<br>
> - *    * The subresource is in the GENERAL layout and is cleared to a value<br>
> - *      other than the special default value.<br>
> + * The fast clear portion of the image is laid out in the following order:<br>
>   *<br>
> - *      Performing a synchronization operation in order to read from the<br>
> - *      subresource is undesirable in this case. Firstly, b) is not an option<br>
> - *      because a layout transition isn't required between a write and read of<br>
> - *      an image in the GENERAL layout. Secondly, it's undesirable to do a)<br>
> - *      explicitly because it would require large infrastructural changes. The<br>
> - *      Vulkan API supports us in deciding not to optimize this layout by<br>
> - *      stating that using this layout may cause suboptimal performance. NOTE:<br>
> - *      the auxiliary buffer must always be enabled to support a) implicitly.<br>
> + *  * 1 or 4 dwords (depending on hardware generation) for the clear color<br>
> + *  * 1 dword for the anv_fast_clear_type of the clear color<br>
> + *  * On gen9+, 1 dword per level and layer of the image (3D levels count as<br>
> + *    having a single layer) in level-major order for compression state.<br>
>   *<br>
> + * For the purpose of discoverability, the algorithm used to manage<br>
> + * compression and fast-clears is described here:<br>
>   *<br>
> - *    * For the given miplevel, only some of the layers are cleared at once.<br>
> + *  * On a transition from UNDEFINED or PREINITIALIZED to a defined layout,<br>
> + *    all of the values in the fast clear portion of the image are initialized<br>
> + *    to default values.<br>
>   *<br>
> - *      If the user clears each layer to a different value, then tries to<br>
> - *      render to multiple layers at once, we have no ability to perform a<br>
> - *      synchronization operation in between. a) is not helpful because the<br>
> - *      object can only hold one clear value. b) is not an option because a<br>
> - *      layout transition isn't required in this case.<br>
> + *  * On fast-clear, the clear value is written into surface state and also<br>
> + *    into the buffer and the fast clear type is set appropriately.  Both<br>
> + *    setting the fast-clear value in the buffer and setting the fast-clear<br>
> + *    type happen from the GPU using MI commands.<br>
> + *<br>
> + *  * On pipeline barrier transitions, the worst-case transition is computed<br>
> + *    from the image layouts.  The command streamer inspects the fast clear<br>
> + *    type and compression state dwords and constructs a predicate.  The<br>
> + *    worst-case resolve is performed with the given predicate and the fast<br>
> + *    clear and compression state is set accordingly.<br>
> + *<br>
> + * See anv_layout_to_aux_usage and anv_layout_to_fast_clear_type functions for<br>
> + * details on exactly what is allowed in what layouts.<br>
> + *<br>
> + * On gen7-9, we do not have a concept of indirect clear colors in hardware.<br>
> + * In order to deal with this, we have to do some clear color management.<br>
> + *<br>
> + *  * For LOAD_OP_LOAD at the top of a renderpass, we have to copy the clear<br>
> + *    value from the buffer into the surface state with MI commands.<br>
> + *<br>
> + *  * For any blorp operations, we pass the address to the clear value into<br>
> + *    blorp and it knows to copy the clear color.<br>
>   */<br>
>  static void<br>
> -add_fast_clear_state_buffer(<wbr>struct anv_image *image,<br>
> -                            VkImageAspectFlagBits aspect,<br>
> -                            uint32_t plane,<br>
> -                            const struct anv_device *device)<br>
> +add_aux_state_tracking_<wbr>buffer(struct anv_image *image,<br>
> +                              VkImageAspectFlagBits aspect,<br>
> +                              uint32_t plane,<br>
> +                              const struct anv_device *device)<br>
>  {<br>
>     assert(image && device);<br>
>     assert(image->planes[plane].<wbr>aux_surface.isl.size > 0 &&<br>
> @@ -251,20 +259,20 @@ add_fast_clear_state_buffer(<wbr>struct anv_image *image,<br>
>               (image->planes[plane].offset + image->planes[plane].size));<br>
>     }<br>
><br>
> -   const unsigned entry_size = anv_fast_clear_state_entry_<wbr>size(device);<br>
> -   /* There's no padding between entries, so ensure that they're always a<br>
> -    * multiple of 32 bits in order to enable GPU memcpy operations.<br>
> -    */<br>
> -   assert(entry_size % 4 == 0);<br>
> +   /* Clear color and fast clear type */<br>
> +   unsigned state_size = device->isl_dev.ss.clear_<wbr>value_size + 4;<br>
><br>
> -   const unsigned plane_state_size =<br>
> -      entry_size * anv_image_aux_levels(image, aspect);<br>
> +   /* We only need to track compression on CCS_E surfaces.  We don't consider<br>
> +    * 3D images as actually having multiple array layers.<br>
> +    */<br>
> +   if (image->planes[plane].aux_<wbr>usage == ISL_AUX_USAGE_CCS_E)<br>
> +      state_size += image->levels * image->array_size;<br>
><br>
>     image->planes[plane].fast_<wbr>clear_state_offset =<br>
>        image->planes[plane].offset + image->planes[plane].size;<br>
><br>
> -   image->planes[plane].size += plane_state_size;<br>
> -   image->size += plane_state_size;<br>
> +   image->planes[plane].size += state_size;<br>
> +   image->size += state_size;<br>
>  }<br>
><br>
>  /**<br>
> @@ -439,7 +447,7 @@ make_surface(const struct anv_device *dev,<br>
>              }<br>
><br>
>              add_surface(image, &image->planes[plane].aux_<wbr>surface, plane);<br>
> -            add_fast_clear_state_buffer(<wbr>image, aspect, plane, dev);<br>
> +            add_aux_state_tracking_buffer(<wbr>image, aspect, plane, dev);<br>
><br>
>              /* For images created without MUTABLE_FORMAT_BIT set, we know that<br>
>               * they will always be used with the original format.  In<br>
> @@ -463,7 +471,7 @@ make_surface(const struct anv_device *dev,<br>
>                                   &image->planes[plane].aux_<wbr>surface.isl);<br>
>        if (ok) {<br>
>           add_surface(image, &image->planes[plane].aux_<wbr>surface, plane);<br>
> -         add_fast_clear_state_buffer(<wbr>image, aspect, plane, dev);<br>
> +         add_aux_state_tracking_buffer(<wbr>image, aspect, plane, dev);<br>
>           image->planes[plane].aux_usage = ISL_AUX_USAGE_MCS;<br>
>        }<br>
>     }<br>
> diff --git a/src/intel/vulkan/anv_<wbr>private.h b/src/intel/vulkan/anv_<wbr>private.h<br>
> index f0251e2..3d3a773 100644<br>
> --- a/src/intel/vulkan/anv_<wbr>private.h<br>
> +++ b/src/intel/vulkan/anv_<wbr>private.h<br>
> @@ -2483,50 +2483,51 @@ anv_image_aux_layers(const struct anv_image * const image,<br>
>     }<br>
>  }<br>
><br>
> -static inline unsigned<br>
> -anv_fast_clear_state_entry_<wbr>size(const struct anv_device *device)<br>
> -{<br>
> -   assert(device);<br>
> -   /* Entry contents:<br>
> -    *   +-----------------------------<wbr>---------------+<br>
> -    *   | clear value dword(s) | needs resolve dword |<br>
> -    *   +-----------------------------<wbr>---------------+<br>
> -    */<br>
> -<br>
> -   /* Ensure that the needs resolve dword is in fact dword-aligned to enable<br>
> -    * GPU memcpy operations.<br>
> -    */<br>
> -   assert(device->isl_dev.ss.<wbr>clear_value_size % 4 == 0);<br>
> -   return device->isl_dev.ss.clear_<wbr>value_size + 4;<br>
> -}<br>
> -<br>
>  static inline struct anv_address<br>
>  anv_image_get_clear_color_<wbr>addr(const struct anv_device *device,<br>
>                                 const struct anv_image *image,<br>
> -                               VkImageAspectFlagBits aspect,<br>
> -                               unsigned level)<br>
> +                               VkImageAspectFlagBits aspect)<br>
>  {<br>
> +   assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> +<br>
>     uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects, aspect);<br>
>     return (struct anv_address) {<br>
>        .bo = image->planes[plane].bo,<br>
>        .offset = image->planes[plane].bo_offset +<br>
> -                image->planes[plane].fast_<wbr>clear_state_offset +<br>
> -                anv_fast_clear_state_entry_<wbr>size(device) * level,<br>
> +                image->planes[plane].fast_<wbr>clear_state_offset,<br>
>     };<br>
>  }<br>
><br>
>  static inline struct anv_address<br>
> -anv_image_get_needs_resolve_<wbr>addr(const struct anv_device *device,<br>
> -                                 const struct anv_image *image,<br>
> -                                 VkImageAspectFlagBits aspect,<br>
> -                                 unsigned level)<br>
> +anv_image_get_fast_clear_<wbr>type_addr(const struct anv_device *device,<br>
> +                                   const struct anv_image *image,<br>
> +                                   VkImageAspectFlagBits aspect)<br>
>  {<br>
>     struct anv_address addr =<br>
> -      anv_image_get_clear_color_<wbr>addr(device, image, aspect, level);<br>
> +      anv_image_get_clear_color_<wbr>addr(device, image, aspect);<br>
>     addr.offset += device->isl_dev.ss.clear_<wbr>value_size;<br>
>     return addr;<br>
>  }<br>
><br>
> +static inline struct anv_address<br>
> +anv_image_get_compression_<wbr>state_addr(const struct anv_device *device,<br>
> +                                     const struct anv_image *image,<br>
> +                                     VkImageAspectFlagBits aspect,<br>
> +                                     uint32_t level, uint32_t array_layer)<br>
> +{<br>
> +   assert(level < anv_image_aux_levels(image, aspect));<br>
> +   assert(array_layer < anv_image_aux_layers(image, aspect, level));<br>
> +   UNUSED uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects, aspect);<br>
> +   assert(image->planes[plane].<wbr>aux_usage == ISL_AUX_USAGE_CCS_E);<br>
> +<br>
> +   struct anv_address addr =<br>
> +      anv_image_get_fast_clear_type_<wbr>addr(device, image, aspect);<br>
> +   addr.offset += 4; /* Go past the fast clear type */<br>
> +   addr.offset += level * image->array_size;<br>
> +   addr.offset += array_layer;<br>
> +   return addr;<br>
> +}<br>
> +<br>
>  /* Returns true if a HiZ-enabled depth buffer can be sampled from. */<br>
>  static inline bool<br>
>  anv_can_sample_with_hiz(const struct gen_device_info * const devinfo,<br>
> diff --git a/src/intel/vulkan/genX_cmd_<wbr>buffer.c b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> index 15e805f..4c83a5c 100644<br>
> --- a/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> +++ b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> @@ -407,27 +407,45 @@ transition_depth_buffer(struct anv_cmd_buffer *cmd_buffer,<br>
>  #define MI_PREDICATE_SRC0  0x2400<br>
>  #define MI_PREDICATE_SRC1  0x2408<br>
><br>
> -/* Manages the state of an color image subresource to ensure resolves are<br>
> - * performed properly.<br>
> - */<br>
>  static void<br>
> -genX(set_image_needs_resolve)<wbr>(struct anv_cmd_buffer *cmd_buffer,<br>
> -                        const struct anv_image *image,<br>
> -                        VkImageAspectFlagBits aspect,<br>
> -                        unsigned level, bool needs_resolve)<br>
> +set_image_fast_clear_state(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
> +                           const struct anv_image *image,<br>
> +                           VkImageAspectFlagBits aspect,<br>
> +                           enum anv_fast_clear_type fast_clear)<br>
>  {<br>
> -   assert(cmd_buffer && image);<br>
> -   assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> -   assert(level < anv_image_aux_levels(image, aspect));<br>
> -<br>
> -   /* The HW docs say that there is no way to guarantee the completion of<br>
> -    * the following command. We use it nevertheless because it shows no<br>
> -    * issues in testing is currently being used in the GL driver.<br>
> -    */<br>
>     anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM), sdi) {<br>
> -      sdi.Address = anv_image_get_needs_resolve_<wbr>addr(cmd_buffer->device,<br>
> -                                                     image, aspect, level);<br>
> -      sdi.ImmediateData = needs_resolve;<br>
> +      sdi.Address = anv_image_get_fast_clear_type_<wbr>addr(cmd_buffer->device,<br>
> +                                                       image, aspect);<br>
> +      sdi.ImmediateData = fast_clear;<br>
> +   }<br>
> +}<br>
> +<br>
> +static void<br>
> +set_image_compressed_bit(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
> +                         const struct anv_image *image,<br>
> +                         VkImageAspectFlagBits aspect,<br>
> +                         uint32_t level,<br>
> +                         uint32_t base_layer, uint32_t layer_count,<br>
> +                         bool compressed)<br>
> +{<br>
> +   /* We only have CCS_E on gen9+ */<br>
> +   if (GEN_GEN < 9)<br>
> +      return;<br>
> +<br>
> +   uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects, aspect);<br>
> +<br>
> +   /* We only have compression tracking for CCS_E */<br>
> +   if (image->planes[plane].aux_<wbr>usage != ISL_AUX_USAGE_CCS_E)<br>
> +      return;<br>
> +<br>
> +   for (uint32_t a = 0; a < layer_count; a++) {<br>
> +      uint32_t layer = base_layer + a;<br>
> +      anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM), sdi) {<br>
> +         sdi.Address = anv_image_get_compression_<wbr>state_addr(cmd_buffer->device,<br>
> +                                                            image, aspect,<br>
> +                                                            level, layer);<br>
> +         sdi.ImmediateData = compressed ? UINT32_MAX : 0;<br>
> +      }<br>
>     }<br>
>  }<br>
><br>
> @@ -451,32 +469,172 @@ mi_alu(uint32_t opcode, uint32_t operand1, uint32_t operand2)<br>
>  #define CS_GPR(n) (0x2600 + (n) * 8)<br>
><br>
>  static void<br>
> -genX(load_needs_resolve_<wbr>predicate)(struct anv_cmd_buffer *cmd_buffer,<br>
> -                                   const struct anv_image *image,<br>
> -                                   VkImageAspectFlagBits aspect,<br>
> -                                   unsigned level)<br>
> +anv_cmd_predicated_ccs_<wbr>resolve(struct anv_cmd_buffer *cmd_buffer,<br>
> +                               const struct anv_image *image,<br>
> +                               VkImageAspectFlagBits aspect,<br>
> +                               uint32_t level, uint32_t array_layer,<br>
> +                               enum isl_aux_op resolve_op,<br>
> +                               enum anv_fast_clear_type fast_clear_supported)<br>
>  {<br>
> -   assert(cmd_buffer && image);<br>
> -   assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> -   assert(level < anv_image_aux_levels(image, aspect));<br>
> +   struct anv_address fast_clear_type_addr =<br>
> +      anv_image_get_fast_clear_type_<wbr>addr(cmd_buffer->device, image, aspect);<br>
> +<br>
> +#if GEN_GEN >= 9<br>
> +   const uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects, aspect);<br>
> +   const bool decompress =<br>
> +      resolve_op == ISL_AUX_OP_FULL_RESOLVE &&<br>
> +      image->planes[plane].aux_usage == ISL_AUX_USAGE_CCS_E;<br>
> +<br>
> +   /* This function shouldn't get called if it isn't going to do anything */<br>
> +   assert(decompress || fast_clear_supported < ANV_FAST_CLEAR_ANY);<br>
><br>
> -   const struct anv_address resolve_flag_addr =<br>
> -      anv_image_get_needs_resolve_<wbr>addr(cmd_buffer->device,<br>
> -                                       image, aspect, level);<br>
> +   if (level == 0 && array_layer == 0) {<br>
> +      /* This is the complex case because we have to worry about dealing with<br>
> +       * the fast clear color.  Unfortunately, it's also the common case.<br>
> +       */<br>
> +<br>
> +      /* Poor-man's register allocation */<br>
> +      int next_reg = MI_ALU_REG0;<br>
> +      int pred_reg = -1;<br>
> +<br>
> +      /* Needed for ALU operations */<br>
> +      uint32_t *dw;<br>
> +<br>
> +      const int image_fc = next_reg++;<br>
> +      anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_LOAD_REGISTER_MEM), lrm) {<br>
> +         lrm.RegisterAddress  = CS_GPR(image_fc);<br>
> +         lrm.MemoryAddress    = fast_clear_type_addr;<br>
> +      }<br>
> +      emit_lri(&cmd_buffer->batch, CS_GPR(image_fc) + 4, 0);<br>
> +<br>
> +      if (fast_clear_supported < ANV_FAST_CLEAR_ANY) {<br>
> +         /* We need to compute (fast_clear_supported < image->fast_clear).<br>
> +          * We do this by subtracting and storing the carry bit.<br>
> +          */<br>
> +         const int fc_imm = next_reg++;<br>
> +         emit_lri(&cmd_buffer->batch, CS_GPR(fc_imm), fast_clear_supported);<br>
> +         emit_lri(&cmd_buffer->batch, CS_GPR(fc_imm) + 4, 0);<br>
> +<br>
> +         assert(pred_reg == -1);<br>
> +         pred_reg = next_reg++;<br>
> +<br>
> +         dw = anv_batch_emitn(&cmd_buffer-><wbr>batch, 5, GENX(MI_MATH));<br>
> +         dw[1] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCA, fc_imm);<br>
> +         dw[2] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCB, image_fc);<br>
> +         dw[3] = mi_alu(MI_ALU_SUB, 0, 0);<br>
> +         dw[4] = mi_alu(MI_ALU_STORE, pred_reg, MI_ALU_CF);<br>
> +      }<br>
> +<br>
> +      if (decompress) {<br>
> +         /* If we're doing a full resolve so we need the compression state */<br>
> +         struct anv_address compression_state_addr =<br>
> +            anv_image_get_compression_<wbr>state_addr(cmd_buffer->device, image,<br>
> +                                                 aspect, level, array_layer);<br>
> +         if (pred_reg == -1) {<br>
> +            pred_reg = next_reg++;<br>
> +            anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_LOAD_REGISTER_MEM), lrm) {<br>
> +               lrm.RegisterAddress  = CS_GPR(pred_reg);<br>
> +               lrm.MemoryAddress    = compression_state_addr;<br>
> +            }<br>
> +         } else {<br>
> +            /* OR the compression state into the predicate.  The compression<br>
> +             * state is already in 0/~0 form.<br>
> +             */<br>
> +            const int image_comp = next_reg++;<br>
> +            anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_LOAD_REGISTER_MEM), lrm) {<br>
> +               lrm.RegisterAddress  = CS_GPR(image_comp);<br>
> +               lrm.MemoryAddress    = compression_state_addr;<br>
> +            }<br>
><br>
> -   /* Make the pending predicated resolve a no-op if one is not needed.<br>
> -    * predicate = do_resolve = resolve_flag != 0;<br>
> +            dw = anv_batch_emitn(&cmd_buffer-><wbr>batch, 5, GENX(MI_MATH));<br>
> +            dw[1] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCA, pred_reg);<br>
> +            dw[2] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCB, image_comp);<br>
> +            dw[3] = mi_alu(MI_ALU_OR, 0, 0);<br>
> +            dw[4] = mi_alu(MI_ALU_STORE, pred_reg, MI_ALU_ACCU);<br>
> +         }<br>
> +<br>
> +         anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM), sdi) {<br>
> +            sdi.Address = compression_state_addr;<br>
> +            sdi.ImmediateData = 0;<br>
> +         }<br>
> +      }<br>
> +<br>
> +      /* Store the predicate */<br>
> +      assert(pred_reg != -1);<br>
> +      emit_lrr(&cmd_buffer->batch, MI_PREDICATE_SRC0, CS_GPR(pred_reg));<br>
> +<br>
> +      /* If the predicate is true, we want to write 0 to the fast clear type<br>
> +       * and, if it's false, leave it alone.  We can do this by writing<br>
> +       *<br>
> +       * clear_type = clear_type & ~predicate;<br>
> +       */<br>
> +      dw = anv_batch_emitn(&cmd_buffer-><wbr>batch, 5, GENX(MI_MATH));<br>
> +      dw[1] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCA, image_fc);<br>
> +      dw[2] = mi_alu(MI_ALU_LOADINV, MI_ALU_SRCB, pred_reg);<br>
> +      dw[3] = mi_alu(MI_ALU_AND, 0, 0);<br>
> +      dw[4] = mi_alu(MI_ALU_STORE, image_fc, MI_ALU_ACCU);<br>
> +<br>
> +      anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_REGISTER_MEM), srm) {<br>
> +         srm.RegisterAddress  = CS_GPR(image_fc);<br>
> +         srm.MemoryAddress    = fast_clear_type_addr;<br>
> +      }<br>
> +   } else if (decompress) {<br>
> +      /* We're trying to get rid of compression but we don't care about fast<br>
> +       * clears so all we need is the compression predicate.<br>
> +       */<br>
> +      assert(resolve_op == ISL_AUX_OP_FULL_RESOLVE);<br>
> +      struct anv_address compression_state_addr =<br>
> +         anv_image_get_compression_<wbr>state_addr(cmd_buffer->device, image,<br>
> +                                              aspect, level, array_layer);<br>
> +      anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_LOAD_REGISTER_MEM), lrm) {<br>
> +         lrm.RegisterAddress  = MI_PREDICATE_SRC0;<br>
> +         lrm.MemoryAddress    = compression_state_addr;<br>
> +      }<br>
> +      anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM), sdi) {<br>
> +         sdi.Address = compression_state_addr;<br>
> +         sdi.ImmediateData = 0;<br>
> +      }<br>
> +   } else {<br>
> +      /* In this case, we're trying to do a partial resolve on a slice that<br>
> +       * doesn't have clear color.  There's nothing to do.<br>
> +       */<br>
> +      return;<br>
> +   }<br>
> +<br>
> +#else /* GEN_GEN <= 8 */<br>
> +   assert(resolve_op == ISL_AUX_OP_FULL_RESOLVE);<br>
> +   assert(fast_clear_supported != ANV_FAST_CLEAR_ANY);<br>
> +<br>
> +   /* On gen8, we don't have a concept of default clear colors because we<br>
> +    * can't sample from CCS surfaces.  It's enough to just load the fast clear<br>
> +    * state into the predicate register.<br>
> +    */<br>
> +   emit_lrm(&cmd_buffer->batch, MI_PREDICATE_SRC0,<br>
> +            <a href="http://fast_clear_type_addr.bo" rel="noreferrer" target="_blank">fast_clear_type_addr.bo</a>, fast_clear_type_addr.offset);<br>
> +#endif<br>
> +<br>
> +   /* We use the first half of src0 for the actual predicate.  Set the second<br>
> +    * half of src0 and all of src1 to 0 as the predicate operation will be<br>
> +    * doing an implicit src0 != src1.<br>
>      */<br>
> +   emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4, 0);<br>
>     emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1    , 0);<br>
>     emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1 + 4, 0);<br>
> -   emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0    , 0);<br>
> -   emit_lrm(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4,<br>
> -            <a href="http://resolve_flag_addr.bo" rel="noreferrer" target="_blank">resolve_flag_addr.bo</a>, resolve_flag_addr.offset);<br>
> +<br>
>     anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_PREDICATE), mip) {<br>
>        mip.LoadOperation    = LOAD_LOADINV;<br>
>        mip.CombineOperation = COMBINE_SET;<br>
>        mip.CompareOperation = COMPARE_SRCS_EQUAL;<br>
>     }<br>
> +<br>
> +   if (image->type == VK_IMAGE_TYPE_3D) {<br>
> +      anv_image_ccs_op(cmd_buffer, image, aspect, level,<br>
> +                       0, anv_minify(image->extent.<wbr>depth, level),<br>
> +                       resolve_op, true);<br>
> +   } else {<br>
> +      anv_image_ccs_op(cmd_buffer, image, aspect, level,<br>
> +                       array_layer, 1, resolve_op, true);<br>
> +   }<br>
>  }<br>
><br>
>  void<br>
> @@ -490,17 +648,35 @@ genX(cmd_buffer_mark_image_<wbr>written)(struct anv_cmd_buffer *cmd_buffer,<br>
>  {<br>
>     /* The aspect must be exactly one of the image aspects. */<br>
>     assert(_mesa_bitcount(aspect) == 1 && (aspect & image->aspects));<br>
> +<br>
> +   /* The only compression types with more than just fast-clears are MCS,<br>
> +    * CCS_E, and HiZ.  With HiZ we just trust the layout and don't actually<br>
> +    * track the current fast-clear and compression state.  This leaves us<br>
> +    * with just MCS and CCS_E.<br>
> +    */<br>
> +   if (aux_usage != ISL_AUX_USAGE_CCS_E &&<br>
> +       aux_usage != ISL_AUX_USAGE_MCS)<br>
> +      return;<br>
> +<br>
> +   if (image->type == VK_IMAGE_TYPE_3D) {<br>
> +      base_layer = 0;<br>
> +      layer_count = 1;<br>
> +   }<br>
> +<br>
> +   set_image_compressed_bit(cmd_<wbr>buffer, image, aspect,<br>
> +                            level, base_layer, layer_count, true);<br>
>  }<br>
><br>
>  static void<br>
> -init_fast_clear_state_entry(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
> -                            const struct anv_image *image,<br>
> -                            VkImageAspectFlagBits aspect,<br>
> -                            unsigned level)<br>
> +init_fast_clear_color(struct anv_cmd_buffer *cmd_buffer,<br>
> +                      const struct anv_image *image,<br>
> +                      VkImageAspectFlagBits aspect)<br>
>  {<br>
>     assert(cmd_buffer && image);<br>
>     assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> -   assert(level < anv_image_aux_levels(image, aspect));<br>
> +<br>
> +   set_image_fast_clear_state(<wbr>cmd_buffer, image, aspect,<br>
> +                              ANV_FAST_CLEAR_NONE);<br>
><br>
>     uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects, aspect);<br>
>     enum isl_aux_usage aux_usage = image->planes[plane].aux_<wbr>usage;<br>
> @@ -517,7 +693,7 @@ init_fast_clear_state_entry(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
>      * values in the clear value dword(s).<br>
>      */<br>
>     struct anv_address addr =<br>
> -      anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image, aspect, level);<br>
> +      anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image, aspect);<br>
>     unsigned i = 0;<br>
>     for (; i < cmd_buffer->device->isl_dev.<wbr>ss.clear_value_size; i += 4) {<br>
>        anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM), sdi) {<br>
> @@ -558,19 +734,17 @@ genX(copy_fast_clear_dwords)(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
>                               struct anv_state surface_state,<br>
>                               const struct anv_image *image,<br>
>                               VkImageAspectFlagBits aspect,<br>
> -                             unsigned level,<br>
>                               bool copy_from_surface_state)<br>
>  {<br>
>     assert(cmd_buffer && image);<br>
>     assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> -   assert(level < anv_image_aux_levels(image, aspect));<br>
><br>
>     struct anv_bo *ss_bo =<br>
>        &cmd_buffer->device-><a href="http://surface_state_pool.block_pool.bo" rel="noreferrer" target="_blank">surface_<wbr>state_pool.block_pool.bo</a>;<br>
>     uint32_t ss_clear_offset = surface_state.offset +<br>
>        cmd_buffer->device->isl_dev.<wbr>ss.clear_value_offset;<br>
>     const struct anv_address entry_addr =<br>
> -      anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image, aspect, level);<br>
> +      anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image, aspect);<br>
>     unsigned copy_size = cmd_buffer->device->isl_dev.<wbr>ss.clear_value_size;<br>
><br>
>     if (copy_from_surface_state) {<br>
> @@ -657,20 +831,11 @@ transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,<br>
>                                 base_layer, layer_count);<br>
>     }<br>
><br>
> -   if (base_layer >= anv_image_aux_layers(image, aspect, base_level))<br>
> -      return;<br>
> -<br>
> -   /* A transition of a 3D subresource works on all slices at a time. */<br>
> -   if (image->type == VK_IMAGE_TYPE_3D) {<br>
> +   if (image->type == VK_IMAGE_TYPE_3D)<br>
>        base_layer = 0;<br>
> -      layer_count = anv_minify(image->extent.<wbr>depth, base_level);<br>
> -   }<br>
><br>
> -   /* We're interested in the subresource range subset that has aux data. */<br>
> -   level_count = MIN2(level_count, anv_image_aux_levels(image, aspect) - base_level);<br>
<br>
</div></div>By deleting this line, we lose some flexibility. If we later choose to<br>
enable CCS_D on gen7 for the first level of a mipmapped surface,</blockquote><div><br>But are we ever going to do that?  This function is complex enough without having to worry about cases we don't yet support but might some day.<br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> we'll<br>
start asserting in the ambiguation pass later on in this function.<span class="gmail-"><br></span></blockquote><div><br></div><div>That's a very easy bug to fix.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">
> -   layer_count = MIN2(layer_count,<br>
> -                      anv_image_aux_layers(image, aspect, base_level) - base_layer);<br>
> -   last_level_num = base_level + level_count;<br>
<br>
</span>I have to think more about these deletions as well.<br>
<span class="gmail-HOEnZb"><font color="#888888"><br>
-Nanley<br>
</font></span><div class="gmail-HOEnZb"><div class="gmail-h5"><br>
> +   if (base_layer >= anv_image_aux_layers(image, aspect, base_level))<br>
> +      return;<br>
><br>
>     assert(image->tiling == VK_IMAGE_TILING_OPTIMAL);<br>
><br>
> @@ -684,8 +849,8 @@ transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,<br>
>         *<br>
>         * Initialize the relevant clear buffer entries.<br>
>         */<br>
> -      for (unsigned level = base_level; level < last_level_num; level++)<br>
> -         init_fast_clear_state_entry(<wbr>cmd_buffer, image, aspect, level);<br>
> +      if (base_level == 0 && base_layer == 0)<br>
> +         init_fast_clear_color(cmd_<wbr>buffer, image, aspect);<br>
><br>
>        /* Initialize the aux buffers to enable correct rendering.  In order to<br>
>         * ensure that things such as storage images work correctly, aux buffers<br>
> @@ -701,16 +866,26 @@ transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,<br>
>         * We don't have any data to show that this is a problem, but we want to<br>
>         * avoid causing difficult-to-debug problems.<br>
>         */<br>
> +<br>
>        if (image->samples == 1) {<br>
>           for (uint32_t l = 0; l < level_count; l++) {<br>
>              const uint32_t level = base_level + l;<br>
> -            const uint32_t level_layer_count =<br>
> +            uint32_t level_layer_count =<br>
>                 MIN2(layer_count, anv_image_aux_layers(image, aspect, level));<br>
> +<br>
> +            /* A transition of a 3D subresource works on all slices. */<br>
> +            if (image->type == VK_IMAGE_TYPE_3D)<br>
> +               level_layer_count = anv_minify(image->extent.<wbr>depth, level);<br>
> +<br>
>              anv_image_ccs_op(cmd_buffer, image, aspect, level,<br>
>                               base_layer, level_layer_count,<br>
>                               ISL_AUX_OP_AMBIGUATE, false);<br>
> -            genX(set_image_needs_resolve)(<wbr>cmd_buffer, image,<br>
> -                                          aspect, level, false);<br>
> +<br>
> +            if (image->planes[plane].aux_<wbr>usage == ISL_AUX_USAGE_CCS_E) {<br>
> +               set_image_compressed_bit(cmd_<wbr>buffer, image, aspect,<br>
> +                                        level, base_layer, level_layer_count,<br>
> +                                        false);<br>
> +            }<br>
>           }<br>
>        } else {<br>
>           if (image->samples == 4 || image->samples == 16) {<br>
> @@ -723,10 +898,6 @@ transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,<br>
>           anv_image_mcs_op(cmd_buffer, image, aspect,<br>
>                            base_layer, layer_count,<br>
>                            ISL_AUX_OP_FAST_CLEAR, false);<br>
> -         for (unsigned level = base_level; level < last_level_num; level++) {<br>
> -            genX(set_image_needs_resolve)(<wbr>cmd_buffer, image,<br>
> -                                          aspect, level, true);<br>
> -         }<br>
>        }<br>
>        return;<br>
>     }<br>
> @@ -793,19 +964,14 @@ transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,<br>
>     cmd_buffer->state.pending_<wbr>pipe_bits |=<br>
>        ANV_PIPE_RENDER_TARGET_CACHE_<wbr>FLUSH_BIT | ANV_PIPE_CS_STALL_BIT;<br>
><br>
> -   for (uint32_t level = base_level; level < last_level_num; level++) {<br>
> -<br>
> -      /* The number of layers changes at each 3D miplevel. */<br>
> -      if (image->type == VK_IMAGE_TYPE_3D) {<br>
> -         layer_count = MIN2(layer_count, anv_image_aux_layers(image, aspect, level));<br>
> +   for (uint32_t l = 0; l < level_count; l++) {<br>
> +      uint32_t level = base_level + l;<br>
> +      for (uint32_t a = 0; a < layer_count; a++) {<br>
> +         uint32_t array_layer = base_layer + a;<br>
> +         anv_cmd_predicated_ccs_<wbr>resolve(cmd_buffer, image, aspect,<br>
> +                                        level, array_layer, resolve_op,<br>
> +                                        final_fast_clear);<br>
>        }<br>
> -<br>
> -      genX(load_needs_resolve_<wbr>predicate)(cmd_buffer, image, aspect, level);<br>
> -<br>
> -      anv_image_ccs_op(cmd_buffer, image, aspect, level,<br>
> -                       base_layer, layer_count, resolve_op, true);<br>
> -<br>
> -      genX(set_image_needs_resolve)(<wbr>cmd_buffer, image, aspect, level, false);<br>
>     }<br>
><br>
>     cmd_buffer->state.pending_<wbr>pipe_bits |=<br>
> @@ -3132,28 +3298,26 @@ cmd_buffer_subpass_sync_fast_<wbr>clear_values(struct anv_cmd_buffer *cmd_buffer)<br>
>           genX(copy_fast_clear_dwords)(<wbr>cmd_buffer, att_state->color.state,<br>
>                                        iview->image,<br>
>                                        VK_IMAGE_ASPECT_COLOR_BIT,<br>
> -                                      iview->planes[0].isl.base_<wbr>level,<br>
>                                        true /* copy from ss */);<br>
><br>
>           /* Fast-clears impact whether or not a resolve will be necessary. */<br>
> -         if (iview->image->planes[0].aux_<wbr>usage == ISL_AUX_USAGE_CCS_E &&<br>
> -             att_state->clear_color_is_<wbr>zero) {<br>
> +         if (att_state->clear_color_is_<wbr>zero) {<br>
>              /* This image always has the auxiliary buffer enabled. We can mark<br>
>               * the subresource as not needing a resolve because the clear color<br>
>               * will match what's in every RENDER_SURFACE_STATE object when it's<br>
>               * being used for sampling.<br>
>               */<br>
> -            genX(set_image_needs_resolve)(<wbr>cmd_buffer, iview->image,<br>
> -                                          VK_IMAGE_ASPECT_COLOR_BIT,<br>
> -                                          iview->planes[0].isl.base_<wbr>level,<br>
> -                                          false);<br>
> +            set_image_fast_clear_state(<wbr>cmd_buffer, iview->image,<br>
> +                                       VK_IMAGE_ASPECT_COLOR_BIT,<br>
> +                                       ANV_FAST_CLEAR_ZERO_ONLY);<br>
>           } else {<br>
> -            genX(set_image_needs_resolve)(<wbr>cmd_buffer, iview->image,<br>
> -                                          VK_IMAGE_ASPECT_COLOR_BIT,<br>
> -                                          iview->planes[0].isl.base_<wbr>level,<br>
> -                                          true);<br>
> +            set_image_fast_clear_state(<wbr>cmd_buffer, iview->image,<br>
> +                                       VK_IMAGE_ASPECT_COLOR_BIT,<br>
> +                                       ANV_FAST_CLEAR_ANY);<br>
>           }<br>
> -      } else if (rp_att->load_op == VK_ATTACHMENT_LOAD_OP_LOAD) {<br>
> +      } else if (rp_att->load_op == VK_ATTACHMENT_LOAD_OP_LOAD &&<br>
> +                 iview->planes[0].isl.base_<wbr>level == 0 &&<br>
> +                 iview->planes[0].isl.base_<wbr>array_layer == 0) {<br>
>           /* The attachment may have been fast-cleared in a previous render<br>
>            * pass and the value is needed now. Update the surface state(s).<br>
>            *<br>
> @@ -3162,7 +3326,6 @@ cmd_buffer_subpass_sync_fast_<wbr>clear_values(struct anv_cmd_buffer *cmd_buffer)<br>
>           genX(copy_fast_clear_dwords)(<wbr>cmd_buffer, att_state->color.state,<br>
>                                        iview->image,<br>
>                                        VK_IMAGE_ASPECT_COLOR_BIT,<br>
> -                                      iview->planes[0].isl.base_<wbr>level,<br>
>                                        false /* copy to ss */);<br>
><br>
>           if (need_input_attachment_state(<wbr>rp_att) &&<br>
> @@ -3170,7 +3333,6 @@ cmd_buffer_subpass_sync_fast_<wbr>clear_values(struct anv_cmd_buffer *cmd_buffer)<br>
>              genX(copy_fast_clear_dwords)(<wbr>cmd_buffer, att_state->input.state,<br>
>                                           iview->image,<br>
>                                           VK_IMAGE_ASPECT_COLOR_BIT,<br>
> -                                         iview->planes[0].isl.base_<wbr>level,<br>
>                                           false /* copy to ss */);<br>
>           }<br>
>        }<br>
> --<br>
> 2.5.0.400.gff86faf<br>
><br>
</div></div><div class="gmail-HOEnZb"><div class="gmail-h5">> ______________________________<wbr>_________________<br>
> mesa-dev mailing list<br>
> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</div></div></blockquote></div><br></div></div>