<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Feb 5, 2018 at 5:41 PM, Nanley Chery <span dir="ltr"><<a href="mailto:nanleychery@gmail.com" target="_blank">nanleychery@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">On Fri, Jan 19, 2018 at 03:47:37PM -0800, Jason Ekstrand wrote:<br>
</span><div><div class="gmail-h5">> This commit completely reworks aux tracking. This includes a number of<br>
> somewhat distinct changes:<br>
><br>
> 1) Since we are no longer fast-clearing multiple slices, we only need<br>
> to track one fast clear color and one fast clear type.<br>
><br>
> 2) We store two bits for fast clear instead of one to let us<br>
> distinguish between zero and non-zero fast clear colors. This is<br>
> needed so that we can do full resolves when transitioning to<br>
> PRESENT_SRC_KHR with gen9 CCS images where we allow zero clear<br>
> values in all sorts of places wouldn't normally.<br>
><br>
> 3) We now track compression state as a boolean separate from fast clear<br>
> type and this is tracked on a per-slice granularity.<br>
><br>
> The previous scheme had some issues when it came to individual slices of<br>
> a multi-LOD images. In particular, we only tracked "needs resolve"<br>
> per-LOD but you could do a vkCmdPipelineBarrier that would only resolve<br>
> a portion of the image and would set "needs resolve" to false anyway.<br>
> Also, any transition from an undefined layout would reset the clear<br>
> color for the entire LOD regardless of whether or not there was some<br>
> clear color on some other slice.<br>
><br>
> As far as full/partial resolves go, he assumptions of the previous<br>
> scheme held because the one case where we do need a full resolve when<br>
> CCS_E is enabled is for window-system images. Since we only ever<br>
> allowed X-tiled window-system images, CCS was entirely disabled on gen9+<br>
> and we never got CCS_E. With the advent of Y-tiled window-system<br>
> buffers, we now need to properly support doing a full resolve of images<br>
> marked CCS_E.<br>
> ---<br>
> src/intel/vulkan/anv_blorp.c | 3 +-<br>
> src/intel/vulkan/anv_image.c | 96 ++++++-----<br>
> src/intel/vulkan/anv_private.h | 53 +++---<br>
> src/intel/vulkan/genX_cmd_<wbr>buffer.c | 340 +++++++++++++++++++++++++++---<wbr>-------<br>
> 4 files changed, 331 insertions(+), 161 deletions(-)<br>
><br>
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c<br>
> index 3698543..594b0d8 100644<br>
> --- a/src/intel/vulkan/anv_blorp.c<br>
> +++ b/src/intel/vulkan/anv_blorp.c<br>
> @@ -1757,8 +1757,7 @@ anv_image_ccs_op(struct anv_cmd_buffer *cmd_buffer,<br>
> * particular value and don't care about format or clear value.<br>
> */<br>
> const struct anv_address clear_color_addr =<br>
> - anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image,<br>
> - aspect, level);<br>
> + anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image, aspect);<br>
> surf.clear_color_addr = anv_to_blorp_address(clear_<wbr>color_addr);<br>
> }<br>
><br>
> diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c<br>
> index 94b9ecb..d5f8dcf 100644<br>
> --- a/src/intel/vulkan/anv_image.c<br>
> +++ b/src/intel/vulkan/anv_image.c<br>
> @@ -190,46 +190,54 @@ all_formats_ccs_e_compatible(<wbr>const struct gen_device_info *devinfo,<br>
> * fast-clear values in non-trivial cases (e.g., outside of a render pass in<br>
> * which a fast clear has occurred).<br>
> *<br>
> - * For the purpose of discoverability, the algorithm used to manage this buffer<br>
> - * is described here. A clear value in this buffer is updated when a fast clear<br>
> - * is performed on a subresource. One of two synchronization operations is<br>
> - * performed in order for a following memory access to use the fast-clear<br>
> - * value:<br>
> - * a. Copy the value from the buffer to the surface state object used for<br>
> - * reading. This is done implicitly when the value is the clear value<br>
> - * predetermined to be the default in other surface state objects. This<br>
> - * is currently only done explicitly for the operation below.<br>
> - * b. Do (a) and use the surface state object to resolve the subresource.<br>
> - * This is only done during layout transitions for decent performance.<br>
> + * In order to avoid having multiple clear colors for a single plane of an<br>
> + * image (hence a single RENDER_SURFACE_STATE), we only allow fast-clears on<br>
> + * the first slice (level 0, layer 0). At the time of our testing (Jan 17,<br>
> + * 2018), there were known applications which would benefit from fast-clearing<br>
> + * more than just the first slice.<br>
> *<br>
> - * With the above scheme, we can fast-clear whenever the hardware allows except<br>
> - * for two cases in which synchronization becomes impossible or undesirable:<br>
> - * * The subresource is in the GENERAL layout and is cleared to a value<br>
> - * other than the special default value.<br>
> + * The fast clear portion of the image is laid out in the following order:<br>
> *<br>
> - * Performing a synchronization operation in order to read from the<br>
> - * subresource is undesirable in this case. Firstly, b) is not an option<br>
> - * because a layout transition isn't required between a write and read of<br>
> - * an image in the GENERAL layout. Secondly, it's undesirable to do a)<br>
> - * explicitly because it would require large infrastructural changes. The<br>
> - * Vulkan API supports us in deciding not to optimize this layout by<br>
> - * stating that using this layout may cause suboptimal performance. NOTE:<br>
> - * the auxiliary buffer must always be enabled to support a) implicitly.<br>
> + * * 1 or 4 dwords (depending on hardware generation) for the clear color<br>
> + * * 1 dword for the anv_fast_clear_type of the clear color<br>
> + * * On gen9+, 1 dword per level and layer of the image (3D levels count as<br>
> + * having a single layer) in level-major order for compression state.<br>
> *<br>
> + * For the purpose of discoverability, the algorithm used to manage<br>
> + * compression and fast-clears is described here:<br>
> *<br>
> - * * For the given miplevel, only some of the layers are cleared at once.<br>
> + * * On a transition from UNDEFINED or PREINITIALIZED to a defined layout,<br>
> + * all of the values in the fast clear portion of the image are initialized<br>
> + * to default values.<br>
> *<br>
> - * If the user clears each layer to a different value, then tries to<br>
> - * render to multiple layers at once, we have no ability to perform a<br>
> - * synchronization operation in between. a) is not helpful because the<br>
> - * object can only hold one clear value. b) is not an option because a<br>
> - * layout transition isn't required in this case.<br>
> + * * On fast-clear, the clear value is written into surface state and also<br>
> + * into the buffer and the fast clear type is set appropriately. Both<br>
> + * setting the fast-clear value in the buffer and setting the fast-clear<br>
> + * type happen from the GPU using MI commands.<br>
> + *<br>
> + * * On pipeline barrier transitions, the worst-case transition is computed<br>
> + * from the image layouts. The command streamer inspects the fast clear<br>
> + * type and compression state dwords and constructs a predicate. The<br>
> + * worst-case resolve is performed with the given predicate and the fast<br>
> + * clear and compression state is set accordingly.<br>
> + *<br>
> + * See anv_layout_to_aux_usage and anv_layout_to_fast_clear_type functions for<br>
> + * details on exactly what is allowed in what layouts.<br>
> + *<br>
> + * On gen7-9, we do not have a concept of indirect clear colors in hardware.<br>
> + * In order to deal with this, we have to do some clear color management.<br>
> + *<br>
> + * * For LOAD_OP_LOAD at the top of a renderpass, we have to copy the clear<br>
> + * value from the buffer into the surface state with MI commands.<br>
> + *<br>
> + * * For any blorp operations, we pass the address to the clear value into<br>
> + * blorp and it knows to copy the clear color.<br>
> */<br>
> static void<br>
> -add_fast_clear_state_buffer(<wbr>struct anv_image *image,<br>
> - VkImageAspectFlagBits aspect,<br>
> - uint32_t plane,<br>
> - const struct anv_device *device)<br>
> +add_aux_state_tracking_<wbr>buffer(struct anv_image *image,<br>
> + VkImageAspectFlagBits aspect,<br>
> + uint32_t plane,<br>
> + const struct anv_device *device)<br>
> {<br>
> assert(image && device);<br>
> assert(image->planes[plane].<wbr>aux_surface.isl.size > 0 &&<br>
> @@ -251,20 +259,20 @@ add_fast_clear_state_buffer(<wbr>struct anv_image *image,<br>
> (image->planes[plane].offset + image->planes[plane].size));<br>
> }<br>
><br>
> - const unsigned entry_size = anv_fast_clear_state_entry_<wbr>size(device);<br>
> - /* There's no padding between entries, so ensure that they're always a<br>
> - * multiple of 32 bits in order to enable GPU memcpy operations.<br>
> - */<br>
> - assert(entry_size % 4 == 0);<br>
> + /* Clear color and fast clear type */<br>
> + unsigned state_size = device->isl_dev.ss.clear_<wbr>value_size + 4;<br>
><br>
> - const unsigned plane_state_size =<br>
> - entry_size * anv_image_aux_levels(image, aspect);<br>
> + /* We only need to track compression on CCS_E surfaces. We don't consider<br>
> + * 3D images as actually having multiple array layers.<br>
> + */<br>
> + if (image->planes[plane].aux_<wbr>usage == ISL_AUX_USAGE_CCS_E)<br>
> + state_size += image->levels * image->array_size;<br>
><br>
> image->planes[plane].fast_<wbr>clear_state_offset =<br>
> image->planes[plane].offset + image->planes[plane].size;<br>
><br>
> - image->planes[plane].size += plane_state_size;<br>
> - image->size += plane_state_size;<br>
> + image->planes[plane].size += state_size;<br>
> + image->size += state_size;<br>
> }<br>
><br>
> /**<br>
> @@ -439,7 +447,7 @@ make_surface(const struct anv_device *dev,<br>
> }<br>
><br>
> add_surface(image, &image->planes[plane].aux_<wbr>surface, plane);<br>
> - add_fast_clear_state_buffer(<wbr>image, aspect, plane, dev);<br>
> + add_aux_state_tracking_buffer(<wbr>image, aspect, plane, dev);<br>
><br>
> /* For images created without MUTABLE_FORMAT_BIT set, we know that<br>
> * they will always be used with the original format. In<br>
> @@ -463,7 +471,7 @@ make_surface(const struct anv_device *dev,<br>
> &image->planes[plane].aux_<wbr>surface.isl);<br>
> if (ok) {<br>
> add_surface(image, &image->planes[plane].aux_<wbr>surface, plane);<br>
> - add_fast_clear_state_buffer(<wbr>image, aspect, plane, dev);<br>
> + add_aux_state_tracking_buffer(<wbr>image, aspect, plane, dev);<br>
> image->planes[plane].aux_usage = ISL_AUX_USAGE_MCS;<br>
> }<br>
> }<br>
> diff --git a/src/intel/vulkan/anv_<wbr>private.h b/src/intel/vulkan/anv_<wbr>private.h<br>
> index f0251e2..3d3a773 100644<br>
> --- a/src/intel/vulkan/anv_<wbr>private.h<br>
> +++ b/src/intel/vulkan/anv_<wbr>private.h<br>
> @@ -2483,50 +2483,51 @@ anv_image_aux_layers(const struct anv_image * const image,<br>
> }<br>
> }<br>
><br>
> -static inline unsigned<br>
> -anv_fast_clear_state_entry_<wbr>size(const struct anv_device *device)<br>
> -{<br>
> - assert(device);<br>
> - /* Entry contents:<br>
> - * +-----------------------------<wbr>---------------+<br>
> - * | clear value dword(s) | needs resolve dword |<br>
> - * +-----------------------------<wbr>---------------+<br>
> - */<br>
> -<br>
> - /* Ensure that the needs resolve dword is in fact dword-aligned to enable<br>
> - * GPU memcpy operations.<br>
> - */<br>
> - assert(device->isl_dev.ss.<wbr>clear_value_size % 4 == 0);<br>
> - return device->isl_dev.ss.clear_<wbr>value_size + 4;<br>
> -}<br>
> -<br>
> static inline struct anv_address<br>
> anv_image_get_clear_color_<wbr>addr(const struct anv_device *device,<br>
> const struct anv_image *image,<br>
> - VkImageAspectFlagBits aspect,<br>
> - unsigned level)<br>
> + VkImageAspectFlagBits aspect)<br>
> {<br>
> + assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> +<br>
> uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects, aspect);<br>
> return (struct anv_address) {<br>
> .bo = image->planes[plane].bo,<br>
> .offset = image->planes[plane].bo_offset +<br>
> - image->planes[plane].fast_<wbr>clear_state_offset +<br>
> - anv_fast_clear_state_entry_<wbr>size(device) * level,<br>
> + image->planes[plane].fast_<wbr>clear_state_offset,<br>
> };<br>
> }<br>
><br>
> static inline struct anv_address<br>
> -anv_image_get_needs_resolve_<wbr>addr(const struct anv_device *device,<br>
> - const struct anv_image *image,<br>
> - VkImageAspectFlagBits aspect,<br>
> - unsigned level)<br>
> +anv_image_get_fast_clear_<wbr>type_addr(const struct anv_device *device,<br>
> + const struct anv_image *image,<br>
> + VkImageAspectFlagBits aspect)<br>
> {<br>
> struct anv_address addr =<br>
> - anv_image_get_clear_color_<wbr>addr(device, image, aspect, level);<br>
> + anv_image_get_clear_color_<wbr>addr(device, image, aspect);<br>
> addr.offset += device->isl_dev.ss.clear_<wbr>value_size;<br>
> return addr;<br>
> }<br>
><br>
> +static inline struct anv_address<br>
> +anv_image_get_compression_<wbr>state_addr(const struct anv_device *device,<br>
> + const struct anv_image *image,<br>
> + VkImageAspectFlagBits aspect,<br>
> + uint32_t level, uint32_t array_layer)<br>
> +{<br>
> + assert(level < anv_image_aux_levels(image, aspect));<br>
> + assert(array_layer < anv_image_aux_layers(image, aspect, level));<br>
> + UNUSED uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects, aspect);<br>
> + assert(image->planes[plane].<wbr>aux_usage == ISL_AUX_USAGE_CCS_E);<br>
> +<br>
> + struct anv_address addr =<br>
> + anv_image_get_fast_clear_type_<wbr>addr(device, image, aspect);<br>
> + addr.offset += 4; /* Go past the fast clear type */<br>
> + addr.offset += level * image->array_size;<br>
> + addr.offset += array_layer;<br>
> + return addr;<br>
> +}<br>
> +<br>
> /* Returns true if a HiZ-enabled depth buffer can be sampled from. */<br>
> static inline bool<br>
> anv_can_sample_with_hiz(const struct gen_device_info * const devinfo,<br>
> diff --git a/src/intel/vulkan/genX_cmd_<wbr>buffer.c b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> index 15e805f..4c83a5c 100644<br>
> --- a/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> +++ b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> @@ -407,27 +407,45 @@ transition_depth_buffer(struct anv_cmd_buffer *cmd_buffer,<br>
> #define MI_PREDICATE_SRC0 0x2400<br>
> #define MI_PREDICATE_SRC1 0x2408<br>
><br>
> -/* Manages the state of an color image subresource to ensure resolves are<br>
> - * performed properly.<br>
> - */<br>
> static void<br>
> -genX(set_image_needs_resolve)<wbr>(struct anv_cmd_buffer *cmd_buffer,<br>
> - const struct anv_image *image,<br>
> - VkImageAspectFlagBits aspect,<br>
> - unsigned level, bool needs_resolve)<br>
> +set_image_fast_clear_state(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
> + const struct anv_image *image,<br>
> + VkImageAspectFlagBits aspect,<br>
> + enum anv_fast_clear_type fast_clear)<br>
> {<br>
> - assert(cmd_buffer && image);<br>
> - assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> - assert(level < anv_image_aux_levels(image, aspect));<br>
> -<br>
> - /* The HW docs say that there is no way to guarantee the completion of<br>
> - * the following command. We use it nevertheless because it shows no<br>
> - * issues in testing is currently being used in the GL driver.<br>
> - */<br>
> anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM), sdi) {<br>
> - sdi.Address = anv_image_get_needs_resolve_<wbr>addr(cmd_buffer->device,<br>
> - image, aspect, level);<br>
> - sdi.ImmediateData = needs_resolve;<br>
> + sdi.Address = anv_image_get_fast_clear_type_<wbr>addr(cmd_buffer->device,<br>
> + image, aspect);<br>
> + sdi.ImmediateData = fast_clear;<br>
> + }<br>
> +}<br>
> +<br>
> +static void<br>
> +set_image_compressed_bit(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
> + const struct anv_image *image,<br>
> + VkImageAspectFlagBits aspect,<br>
> + uint32_t level,<br>
> + uint32_t base_layer, uint32_t layer_count,<br>
> + bool compressed)<br>
> +{<br>
> + /* We only have CCS_E on gen9+ */<br>
> + if (GEN_GEN < 9)<br>
> + return;<br>
> +<br>
> + uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects, aspect);<br>
> +<br>
> + /* We only have compression tracking for CCS_E */<br>
> + if (image->planes[plane].aux_<wbr>usage != ISL_AUX_USAGE_CCS_E)<br>
> + return;<br>
> +<br>
> + for (uint32_t a = 0; a < layer_count; a++) {<br>
> + uint32_t layer = base_layer + a;<br>
> + anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM), sdi) {<br>
> + sdi.Address = anv_image_get_compression_<wbr>state_addr(cmd_buffer->device,<br>
> + image, aspect,<br>
> + level, layer);<br>
> + sdi.ImmediateData = compressed ? UINT32_MAX : 0;<br>
> + }<br>
> }<br>
> }<br>
><br>
> @@ -451,32 +469,172 @@ mi_alu(uint32_t opcode, uint32_t operand1, uint32_t operand2)<br>
> #define CS_GPR(n) (0x2600 + (n) * 8)<br>
><br>
> static void<br>
> -genX(load_needs_resolve_<wbr>predicate)(struct anv_cmd_buffer *cmd_buffer,<br>
> - const struct anv_image *image,<br>
> - VkImageAspectFlagBits aspect,<br>
> - unsigned level)<br>
> +anv_cmd_predicated_ccs_<wbr>resolve(struct anv_cmd_buffer *cmd_buffer,<br>
> + const struct anv_image *image,<br>
> + VkImageAspectFlagBits aspect,<br>
> + uint32_t level, uint32_t array_layer,<br>
> + enum isl_aux_op resolve_op,<br>
> + enum anv_fast_clear_type fast_clear_supported)<br>
> {<br>
> - assert(cmd_buffer && image);<br>
> - assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> - assert(level < anv_image_aux_levels(image, aspect));<br>
> + struct anv_address fast_clear_type_addr =<br>
> + anv_image_get_fast_clear_type_<wbr>addr(cmd_buffer->device, image, aspect);<br>
> +<br>
> +#if GEN_GEN >= 9<br>
> + const uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects, aspect);<br>
> + const bool decompress =<br>
> + resolve_op == ISL_AUX_OP_FULL_RESOLVE &&<br>
> + image->planes[plane].aux_usage == ISL_AUX_USAGE_CCS_E;<br>
> +<br>
> + /* This function shouldn't get called if it isn't going to do anything */<br>
> + assert(decompress || fast_clear_supported < ANV_FAST_CLEAR_ANY);<br>
><br>
> - const struct anv_address resolve_flag_addr =<br>
> - anv_image_get_needs_resolve_<wbr>addr(cmd_buffer->device,<br>
> - image, aspect, level);<br>
> + if (level == 0 && array_layer == 0) {<br>
> + /* This is the complex case because we have to worry about dealing with<br>
> + * the fast clear color. Unfortunately, it's also the common case.<br>
> + */<br>
> +<br>
> + /* Poor-man's register allocation */<br>
> + int next_reg = MI_ALU_REG0;<br>
> + int pred_reg = -1;<br>
> +<br>
> + /* Needed for ALU operations */<br>
> + uint32_t *dw;<br>
> +<br>
> + const int image_fc = next_reg++;<br>
> + anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_LOAD_REGISTER_MEM), lrm) {<br>
> + lrm.RegisterAddress = CS_GPR(image_fc);<br>
> + lrm.MemoryAddress = fast_clear_type_addr;<br>
> + }<br>
> + emit_lri(&cmd_buffer->batch, CS_GPR(image_fc) + 4, 0);<br>
> +<br>
> + if (fast_clear_supported < ANV_FAST_CLEAR_ANY) {<br>
> + /* We need to compute (fast_clear_supported < image->fast_clear).<br>
> + * We do this by subtracting and storing the carry bit.<br>
> + */<br>
> + const int fc_imm = next_reg++;<br>
> + emit_lri(&cmd_buffer->batch, CS_GPR(fc_imm), fast_clear_supported);<br>
> + emit_lri(&cmd_buffer->batch, CS_GPR(fc_imm) + 4, 0);<br>
> +<br>
> + assert(pred_reg == -1);<br>
> + pred_reg = next_reg++;<br>
> +<br>
> + dw = anv_batch_emitn(&cmd_buffer-><wbr>batch, 5, GENX(MI_MATH));<br>
> + dw[1] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCA, fc_imm);<br>
> + dw[2] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCB, image_fc);<br>
> + dw[3] = mi_alu(MI_ALU_SUB, 0, 0);<br>
> + dw[4] = mi_alu(MI_ALU_STORE, pred_reg, MI_ALU_CF);<br>
> + }<br>
> +<br>
> + if (decompress) {<br>
> + /* If we're doing a full resolve so we need the compression state */<br>
> + struct anv_address compression_state_addr =<br>
> + anv_image_get_compression_<wbr>state_addr(cmd_buffer->device, image,<br>
> + aspect, level, array_layer);<br>
> + if (pred_reg == -1) {<br>
> + pred_reg = next_reg++;<br>
> + anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_LOAD_REGISTER_MEM), lrm) {<br>
> + lrm.RegisterAddress = CS_GPR(pred_reg);<br>
> + lrm.MemoryAddress = compression_state_addr;<br>
> + }<br>
> + } else {<br>
> + /* OR the compression state into the predicate. The compression<br>
> + * state is already in 0/~0 form.<br>
> + */<br>
> + const int image_comp = next_reg++;<br>
> + anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_LOAD_REGISTER_MEM), lrm) {<br>
> + lrm.RegisterAddress = CS_GPR(image_comp);<br>
> + lrm.MemoryAddress = compression_state_addr;<br>
> + }<br>
><br>
> - /* Make the pending predicated resolve a no-op if one is not needed.<br>
> - * predicate = do_resolve = resolve_flag != 0;<br>
> + dw = anv_batch_emitn(&cmd_buffer-><wbr>batch, 5, GENX(MI_MATH));<br>
> + dw[1] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCA, pred_reg);<br>
> + dw[2] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCB, image_comp);<br>
> + dw[3] = mi_alu(MI_ALU_OR, 0, 0);<br>
> + dw[4] = mi_alu(MI_ALU_STORE, pred_reg, MI_ALU_ACCU);<br>
> + }<br>
> +<br>
> + anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM), sdi) {<br>
> + sdi.Address = compression_state_addr;<br>
> + sdi.ImmediateData = 0;<br>
> + }<br>
> + }<br>
> +<br>
> + /* Store the predicate */<br>
> + assert(pred_reg != -1);<br>
> + emit_lrr(&cmd_buffer->batch, MI_PREDICATE_SRC0, CS_GPR(pred_reg));<br>
> +<br>
> + /* If the predicate is true, we want to write 0 to the fast clear type<br>
> + * and, if it's false, leave it alone. We can do this by writing<br>
> + *<br>
> + * clear_type = clear_type & ~predicate;<br>
> + */<br>
> + dw = anv_batch_emitn(&cmd_buffer-><wbr>batch, 5, GENX(MI_MATH));<br>
> + dw[1] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCA, image_fc);<br>
> + dw[2] = mi_alu(MI_ALU_LOADINV, MI_ALU_SRCB, pred_reg);<br>
> + dw[3] = mi_alu(MI_ALU_AND, 0, 0);<br>
> + dw[4] = mi_alu(MI_ALU_STORE, image_fc, MI_ALU_ACCU);<br>
> +<br>
> + anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_REGISTER_MEM), srm) {<br>
> + srm.RegisterAddress = CS_GPR(image_fc);<br>
> + srm.MemoryAddress = fast_clear_type_addr;<br>
> + }<br>
> + } else if (decompress) {<br>
> + /* We're trying to get rid of compression but we don't care about fast<br>
> + * clears so all we need is the compression predicate.<br>
> + */<br>
> + assert(resolve_op == ISL_AUX_OP_FULL_RESOLVE);<br>
> + struct anv_address compression_state_addr =<br>
> + anv_image_get_compression_<wbr>state_addr(cmd_buffer->device, image,<br>
> + aspect, level, array_layer);<br>
> + anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_LOAD_REGISTER_MEM), lrm) {<br>
> + lrm.RegisterAddress = MI_PREDICATE_SRC0;<br>
> + lrm.MemoryAddress = compression_state_addr;<br>
> + }<br>
> + anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM), sdi) {<br>
> + sdi.Address = compression_state_addr;<br>
> + sdi.ImmediateData = 0;<br>
> + }<br>
> + } else {<br>
> + /* In this case, we're trying to do a partial resolve on a slice that<br>
> + * doesn't have clear color. There's nothing to do.<br>
> + */<br>
> + return;<br>
> + }<br>
> +<br>
> +#else /* GEN_GEN <= 8 */<br>
> + assert(resolve_op == ISL_AUX_OP_FULL_RESOLVE);<br>
> + assert(fast_clear_supported != ANV_FAST_CLEAR_ANY);<br>
> +<br>
> + /* On gen8, we don't have a concept of default clear colors because we<br>
> + * can't sample from CCS surfaces. It's enough to just load the fast clear<br>
> + * state into the predicate register.<br>
> + */<br>
> + emit_lrm(&cmd_buffer->batch, MI_PREDICATE_SRC0,<br>
> + <a href="http://fast_clear_type_addr.bo" rel="noreferrer" target="_blank">fast_clear_type_addr.bo</a>, fast_clear_type_addr.offset);<br>
> +#endif<br>
> +<br>
> + /* We use the first half of src0 for the actual predicate. Set the second<br>
> + * half of src0 and all of src1 to 0 as the predicate operation will be<br>
> + * doing an implicit src0 != src1.<br>
> */<br>
> + emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4, 0);<br>
> emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1 , 0);<br>
> emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1 + 4, 0);<br>
> - emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0 , 0);<br>
> - emit_lrm(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4,<br>
> - <a href="http://resolve_flag_addr.bo" rel="noreferrer" target="_blank">resolve_flag_addr.bo</a>, resolve_flag_addr.offset);<br>
> +<br>
> anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_PREDICATE), mip) {<br>
> mip.LoadOperation = LOAD_LOADINV;<br>
> mip.CombineOperation = COMBINE_SET;<br>
> mip.CompareOperation = COMPARE_SRCS_EQUAL;<br>
> }<br>
> +<br>
> + if (image->type == VK_IMAGE_TYPE_3D) {<br>
> + anv_image_ccs_op(cmd_buffer, image, aspect, level,<br>
> + 0, anv_minify(image->extent.<wbr>depth, level),<br>
> + resolve_op, true);<br>
> + } else {<br>
> + anv_image_ccs_op(cmd_buffer, image, aspect, level,<br>
> + array_layer, 1, resolve_op, true);<br>
> + }<br>
> }<br>
><br>
> void<br>
> @@ -490,17 +648,35 @@ genX(cmd_buffer_mark_image_<wbr>written)(struct anv_cmd_buffer *cmd_buffer,<br>
> {<br>
> /* The aspect must be exactly one of the image aspects. */<br>
> assert(_mesa_bitcount(aspect) == 1 && (aspect & image->aspects));<br>
> +<br>
> + /* The only compression types with more than just fast-clears are MCS,<br>
> + * CCS_E, and HiZ. With HiZ we just trust the layout and don't actually<br>
> + * track the current fast-clear and compression state. This leaves us<br>
> + * with just MCS and CCS_E.<br>
> + */<br>
> + if (aux_usage != ISL_AUX_USAGE_CCS_E &&<br>
> + aux_usage != ISL_AUX_USAGE_MCS)<br>
> + return;<br>
> +<br>
> + if (image->type == VK_IMAGE_TYPE_3D) {<br>
> + base_layer = 0;<br>
> + layer_count = 1;<br>
> + }<br>
> +<br>
> + set_image_compressed_bit(cmd_<wbr>buffer, image, aspect,<br>
> + level, base_layer, layer_count, true);<br>
> }<br>
><br>
> static void<br>
> -init_fast_clear_state_entry(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
> - const struct anv_image *image,<br>
> - VkImageAspectFlagBits aspect,<br>
> - unsigned level)<br>
> +init_fast_clear_color(struct anv_cmd_buffer *cmd_buffer,<br>
> + const struct anv_image *image,<br>
> + VkImageAspectFlagBits aspect)<br>
> {<br>
> assert(cmd_buffer && image);<br>
> assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> - assert(level < anv_image_aux_levels(image, aspect));<br>
> +<br>
> + set_image_fast_clear_state(<wbr>cmd_buffer, image, aspect,<br>
> + ANV_FAST_CLEAR_NONE);<br>
><br>
> uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects, aspect);<br>
> enum isl_aux_usage aux_usage = image->planes[plane].aux_<wbr>usage;<br>
> @@ -517,7 +693,7 @@ init_fast_clear_state_entry(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
> * values in the clear value dword(s).<br>
> */<br>
> struct anv_address addr =<br>
> - anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image, aspect, level);<br>
> + anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image, aspect);<br>
> unsigned i = 0;<br>
> for (; i < cmd_buffer->device->isl_dev.<wbr>ss.clear_value_size; i += 4) {<br>
> anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM), sdi) {<br>
> @@ -558,19 +734,17 @@ genX(copy_fast_clear_dwords)(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
> struct anv_state surface_state,<br>
> const struct anv_image *image,<br>
> VkImageAspectFlagBits aspect,<br>
> - unsigned level,<br>
> bool copy_from_surface_state)<br>
> {<br>
> assert(cmd_buffer && image);<br>
> assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> - assert(level < anv_image_aux_levels(image, aspect));<br>
><br>
> struct anv_bo *ss_bo =<br>
> &cmd_buffer->device-><a href="http://surface_state_pool.block_pool.bo" rel="noreferrer" target="_blank">surface_<wbr>state_pool.block_pool.bo</a>;<br>
> uint32_t ss_clear_offset = surface_state.offset +<br>
> cmd_buffer->device->isl_dev.<wbr>ss.clear_value_offset;<br>
> const struct anv_address entry_addr =<br>
> - anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image, aspect, level);<br>
> + anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image, aspect);<br>
> unsigned copy_size = cmd_buffer->device->isl_dev.<wbr>ss.clear_value_size;<br>
><br>
> if (copy_from_surface_state) {<br>
> @@ -657,20 +831,11 @@ transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,<br>
> base_layer, layer_count);<br>
> }<br>
><br>
> - if (base_layer >= anv_image_aux_layers(image, aspect, base_level))<br>
> - return;<br>
> -<br>
> - /* A transition of a 3D subresource works on all slices at a time. */<br>
> - if (image->type == VK_IMAGE_TYPE_3D) {<br>
> + if (image->type == VK_IMAGE_TYPE_3D)<br>
> base_layer = 0;<br>
> - layer_count = anv_minify(image->extent.<wbr>depth, base_level);<br>
> - }<br>
><br>
> - /* We're interested in the subresource range subset that has aux data. */<br>
> - level_count = MIN2(level_count, anv_image_aux_levels(image, aspect) - base_level);<br>
<br>
</div></div>By deleting this line, we lose some flexibility. If we later choose to<br>
enable CCS_D on gen7 for the first level of a mipmapped surface,</blockquote><div><br>But are we ever going to do that? This function is complex enough without having to worry about cases we don't yet support but might some day.<br> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> we'll<br>
start asserting in the ambiguation pass later on in this function.<span class="gmail-"><br></span></blockquote><div><br></div><div>That's a very easy bug to fix.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">
> - layer_count = MIN2(layer_count,<br>
> - anv_image_aux_layers(image, aspect, base_level) - base_layer);<br>
> - last_level_num = base_level + level_count;<br>
<br>
</span>I have to think more about these deletions as well.<br>
<span class="gmail-HOEnZb"><font color="#888888"><br>
-Nanley<br>
</font></span><div class="gmail-HOEnZb"><div class="gmail-h5"><br>
> + if (base_layer >= anv_image_aux_layers(image, aspect, base_level))<br>
> + return;<br>
><br>
> assert(image->tiling == VK_IMAGE_TILING_OPTIMAL);<br>
><br>
> @@ -684,8 +849,8 @@ transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,<br>
> *<br>
> * Initialize the relevant clear buffer entries.<br>
> */<br>
> - for (unsigned level = base_level; level < last_level_num; level++)<br>
> - init_fast_clear_state_entry(<wbr>cmd_buffer, image, aspect, level);<br>
> + if (base_level == 0 && base_layer == 0)<br>
> + init_fast_clear_color(cmd_<wbr>buffer, image, aspect);<br>
><br>
> /* Initialize the aux buffers to enable correct rendering. In order to<br>
> * ensure that things such as storage images work correctly, aux buffers<br>
> @@ -701,16 +866,26 @@ transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,<br>
> * We don't have any data to show that this is a problem, but we want to<br>
> * avoid causing difficult-to-debug problems.<br>
> */<br>
> +<br>
> if (image->samples == 1) {<br>
> for (uint32_t l = 0; l < level_count; l++) {<br>
> const uint32_t level = base_level + l;<br>
> - const uint32_t level_layer_count =<br>
> + uint32_t level_layer_count =<br>
> MIN2(layer_count, anv_image_aux_layers(image, aspect, level));<br>
> +<br>
> + /* A transition of a 3D subresource works on all slices. */<br>
> + if (image->type == VK_IMAGE_TYPE_3D)<br>
> + level_layer_count = anv_minify(image->extent.<wbr>depth, level);<br>
> +<br>
> anv_image_ccs_op(cmd_buffer, image, aspect, level,<br>
> base_layer, level_layer_count,<br>
> ISL_AUX_OP_AMBIGUATE, false);<br>
> - genX(set_image_needs_resolve)(<wbr>cmd_buffer, image,<br>
> - aspect, level, false);<br>
> +<br>
> + if (image->planes[plane].aux_<wbr>usage == ISL_AUX_USAGE_CCS_E) {<br>
> + set_image_compressed_bit(cmd_<wbr>buffer, image, aspect,<br>
> + level, base_layer, level_layer_count,<br>
> + false);<br>
> + }<br>
> }<br>
> } else {<br>
> if (image->samples == 4 || image->samples == 16) {<br>
> @@ -723,10 +898,6 @@ transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,<br>
> anv_image_mcs_op(cmd_buffer, image, aspect,<br>
> base_layer, layer_count,<br>
> ISL_AUX_OP_FAST_CLEAR, false);<br>
> - for (unsigned level = base_level; level < last_level_num; level++) {<br>
> - genX(set_image_needs_resolve)(<wbr>cmd_buffer, image,<br>
> - aspect, level, true);<br>
> - }<br>
> }<br>
> return;<br>
> }<br>
> @@ -793,19 +964,14 @@ transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,<br>
> cmd_buffer->state.pending_<wbr>pipe_bits |=<br>
> ANV_PIPE_RENDER_TARGET_CACHE_<wbr>FLUSH_BIT | ANV_PIPE_CS_STALL_BIT;<br>
><br>
> - for (uint32_t level = base_level; level < last_level_num; level++) {<br>
> -<br>
> - /* The number of layers changes at each 3D miplevel. */<br>
> - if (image->type == VK_IMAGE_TYPE_3D) {<br>
> - layer_count = MIN2(layer_count, anv_image_aux_layers(image, aspect, level));<br>
> + for (uint32_t l = 0; l < level_count; l++) {<br>
> + uint32_t level = base_level + l;<br>
> + for (uint32_t a = 0; a < layer_count; a++) {<br>
> + uint32_t array_layer = base_layer + a;<br>
> + anv_cmd_predicated_ccs_<wbr>resolve(cmd_buffer, image, aspect,<br>
> + level, array_layer, resolve_op,<br>
> + final_fast_clear);<br>
> }<br>
> -<br>
> - genX(load_needs_resolve_<wbr>predicate)(cmd_buffer, image, aspect, level);<br>
> -<br>
> - anv_image_ccs_op(cmd_buffer, image, aspect, level,<br>
> - base_layer, layer_count, resolve_op, true);<br>
> -<br>
> - genX(set_image_needs_resolve)(<wbr>cmd_buffer, image, aspect, level, false);<br>
> }<br>
><br>
> cmd_buffer->state.pending_<wbr>pipe_bits |=<br>
> @@ -3132,28 +3298,26 @@ cmd_buffer_subpass_sync_fast_<wbr>clear_values(struct anv_cmd_buffer *cmd_buffer)<br>
> genX(copy_fast_clear_dwords)(<wbr>cmd_buffer, att_state->color.state,<br>
> iview->image,<br>
> VK_IMAGE_ASPECT_COLOR_BIT,<br>
> - iview->planes[0].isl.base_<wbr>level,<br>
> true /* copy from ss */);<br>
><br>
> /* Fast-clears impact whether or not a resolve will be necessary. */<br>
> - if (iview->image->planes[0].aux_<wbr>usage == ISL_AUX_USAGE_CCS_E &&<br>
> - att_state->clear_color_is_<wbr>zero) {<br>
> + if (att_state->clear_color_is_<wbr>zero) {<br>
> /* This image always has the auxiliary buffer enabled. We can mark<br>
> * the subresource as not needing a resolve because the clear color<br>
> * will match what's in every RENDER_SURFACE_STATE object when it's<br>
> * being used for sampling.<br>
> */<br>
> - genX(set_image_needs_resolve)(<wbr>cmd_buffer, iview->image,<br>
> - VK_IMAGE_ASPECT_COLOR_BIT,<br>
> - iview->planes[0].isl.base_<wbr>level,<br>
> - false);<br>
> + set_image_fast_clear_state(<wbr>cmd_buffer, iview->image,<br>
> + VK_IMAGE_ASPECT_COLOR_BIT,<br>
> + ANV_FAST_CLEAR_ZERO_ONLY);<br>
> } else {<br>
> - genX(set_image_needs_resolve)(<wbr>cmd_buffer, iview->image,<br>
> - VK_IMAGE_ASPECT_COLOR_BIT,<br>
> - iview->planes[0].isl.base_<wbr>level,<br>
> - true);<br>
> + set_image_fast_clear_state(<wbr>cmd_buffer, iview->image,<br>
> + VK_IMAGE_ASPECT_COLOR_BIT,<br>
> + ANV_FAST_CLEAR_ANY);<br>
> }<br>
> - } else if (rp_att->load_op == VK_ATTACHMENT_LOAD_OP_LOAD) {<br>
> + } else if (rp_att->load_op == VK_ATTACHMENT_LOAD_OP_LOAD &&<br>
> + iview->planes[0].isl.base_<wbr>level == 0 &&<br>
> + iview->planes[0].isl.base_<wbr>array_layer == 0) {<br>
> /* The attachment may have been fast-cleared in a previous render<br>
> * pass and the value is needed now. Update the surface state(s).<br>
> *<br>
> @@ -3162,7 +3326,6 @@ cmd_buffer_subpass_sync_fast_<wbr>clear_values(struct anv_cmd_buffer *cmd_buffer)<br>
> genX(copy_fast_clear_dwords)(<wbr>cmd_buffer, att_state->color.state,<br>
> iview->image,<br>
> VK_IMAGE_ASPECT_COLOR_BIT,<br>
> - iview->planes[0].isl.base_<wbr>level,<br>
> false /* copy to ss */);<br>
><br>
> if (need_input_attachment_state(<wbr>rp_att) &&<br>
> @@ -3170,7 +3333,6 @@ cmd_buffer_subpass_sync_fast_<wbr>clear_values(struct anv_cmd_buffer *cmd_buffer)<br>
> genX(copy_fast_clear_dwords)(<wbr>cmd_buffer, att_state->input.state,<br>
> iview->image,<br>
> VK_IMAGE_ASPECT_COLOR_BIT,<br>
> - iview->planes[0].isl.base_<wbr>level,<br>
> false /* copy to ss */);<br>
> }<br>
> }<br>
> --<br>
> 2.5.0.400.gff86faf<br>
><br>
</div></div><div class="gmail-HOEnZb"><div class="gmail-h5">> ______________________________<wbr>_________________<br>
> mesa-dev mailing list<br>
> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</div></div></blockquote></div><br></div></div>