<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Feb 7, 2018 at 1:39 PM, Nanley Chery <span dir="ltr"><<a href="mailto:nanleychery@gmail.com" target="_blank">nanleychery@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Wed, Feb 07, 2018 at 11:27:38AM -0800, Jason Ekstrand wrote:<br>
> On Wed, Feb 7, 2018 at 9:59 AM, Nanley Chery <<a href="mailto:nanleychery@gmail.com">nanleychery@gmail.com</a>> wrote:<br>
><br>
> > On Tue, Feb 06, 2018 at 05:57:26PM -0800, Jason Ekstrand wrote:<br>
> > > On Tue, Feb 6, 2018 at 5:09 PM, Nanley Chery <<a href="mailto:nanleychery@gmail.com">nanleychery@gmail.com</a>><br>
> > wrote:<br>
> > ><br>
> > > > On Mon, Feb 05, 2018 at 06:16:26PM -0800, Jason Ekstrand wrote:<br>
> > > > > This commit completely reworks aux tracking. This includes a number<br>
> > of<br>
> > > > > somewhat distinct changes:<br>
> > > > ><br>
> > > > > 1) Since we are no longer fast-clearing multiple slices, we only<br>
> > need<br>
> > > > > to track one fast clear color and one fast clear type.<br>
> > > > ><br>
> > > > > 2) We store two bits for fast clear instead of one to let us<br>
> > > > > distinguish between zero and non-zero fast clear colors. This is<br>
> > > > > needed so that we can do full resolves when transitioning to<br>
> > > > > PRESENT_SRC_KHR with gen9 CCS images where we allow zero clear<br>
> > > > > values in all sorts of places we wouldn't normally.<br>
> > > > ><br>
> > > > > 3) We now track compression state as a boolean separate from fast<br>
> > clear<br>
> > > > > type and this is tracked on a per-slice granularity.<br>
> > > > ><br>
> > > > > The previous scheme had some issues when it came to individual<br>
> > slices of<br>
> > > > > a multi-LOD images. In particular, we only tracked "needs resolve"<br>
> > > > > per-LOD but you could do a vkCmdPipelineBarrier that would only<br>
> > resolve<br>
> > > > > a portion of the image and would set "needs resolve" to false anyway.<br>
> > > > > Also, any transition from an undefined layout would reset the clear<br>
> > > > > color for the entire LOD regardless of whether or not there was some<br>
> > > > > clear color on some other slice.<br>
> > > > ><br>
> > > > > As far as full/partial resolves go, he assumptions of the previous<br>
> > > > > scheme held because the one case where we do need a full resolve when<br>
> > > > > CCS_E is enabled is for window-system images. Since we only ever<br>
> > > > > allowed X-tiled window-system images, CCS was entirely disabled on<br>
> > gen9+<br>
> > > > > and we never got CCS_E. With the advent of Y-tiled window-system<br>
> > > > > buffers, we now need to properly support doing a full resolve of<br>
> > images<br>
> > > > > marked CCS_E.<br>
> > > > ><br>
> > > > > v2 (Jason Ekstrand):<br>
> > > > > - Fix an bug in the compressed flag offset calculation<br>
> > > > > - Treat 3D images as multi-slice for the purposes of resolve<br>
> > tracking<br>
> > > > ><br>
> > > > > Reviewed-by: Topi Pohjolainen <<a href="mailto:topi.pohjolainen@intel.com">topi.pohjolainen@intel.com</a>><br>
> > > > > ---<br>
> > > > > src/intel/vulkan/anv_blorp.c | 3 +-<br>
> > > > > src/intel/vulkan/anv_image.c | 100 ++++++-----<br>
> > > > > src/intel/vulkan/anv_private.h | 60 ++++---<br>
> > > > > src/intel/vulkan/genX_cmd_<wbr>buffer.c | 340<br>
> > +++++++++++++++++++++++++++---<br>
> > > > -------<br>
> > > > > 4 files changed, 345 insertions(+), 158 deletions(-)<br>
> > > > ><br>
> > > > > diff --git a/src/intel/vulkan/anv_blorp.c<br>
> > b/src/intel/vulkan/anv_blorp.c<br>
> > > > > index 497ae6f..fc3b717 100644<br>
> > > > > --- a/src/intel/vulkan/anv_blorp.c<br>
> > > > > +++ b/src/intel/vulkan/anv_blorp.c<br>
> > > > > @@ -1758,8 +1758,7 @@ anv_image_ccs_op(struct anv_cmd_buffer<br>
> > *cmd_buffer,<br>
> > > > > * particular value and don't care about format or clear<br>
> > value.<br>
> > > > > */<br>
> > > > > const struct anv_address clear_color_addr =<br>
> > > > > - anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image,<br>
> > > > > - aspect, level);<br>
> > > > > + anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image,<br>
> > > > aspect);<br>
> > > > > surf.clear_color_addr = anv_to_blorp_address(clear_<br>
> > color_addr);<br>
> > > > > }<br>
> > > > ><br>
> > > > > diff --git a/src/intel/vulkan/anv_image.c<br>
> > b/src/intel/vulkan/anv_image.c<br>
> > > > > index 11942d0..011e952 100644<br>
> > > > > --- a/src/intel/vulkan/anv_image.c<br>
> > > > > +++ b/src/intel/vulkan/anv_image.c<br>
> > > > > @@ -190,46 +190,54 @@ all_formats_ccs_e_compatible(<wbr>const struct<br>
> > > > gen_device_info *devinfo,<br>
> > > > > * fast-clear values in non-trivial cases (e.g., outside of a render<br>
> > > > pass in<br>
> > > > > * which a fast clear has occurred).<br>
> > > > > *<br>
> > > > > - * For the purpose of discoverability, the algorithm used to manage<br>
> > > > this buffer<br>
> > > > > - * is described here. A clear value in this buffer is updated when a<br>
> > > > fast clear<br>
> > > > > - * is performed on a subresource. One of two synchronization<br>
> > operations<br>
> > > > is<br>
> > > > > - * performed in order for a following memory access to use the<br>
> > > > fast-clear<br>
> > > > > - * value:<br>
> > > > > - * a. Copy the value from the buffer to the surface state object<br>
> > > > used for<br>
> > > > > - * reading. This is done implicitly when the value is the<br>
> > clear<br>
> > > > value<br>
> > > > > - * predetermined to be the default in other surface state<br>
> > > > objects. This<br>
> > > > > - * is currently only done explicitly for the operation below.<br>
> > > > > - * b. Do (a) and use the surface state object to resolve the<br>
> > > > subresource.<br>
> > > > > - * This is only done during layout transitions for decent<br>
> > > > performance.<br>
> > > > > + * In order to avoid having multiple clear colors for a single<br>
> > plane of<br>
> > > > an<br>
> > > > > + * image (hence a single RENDER_SURFACE_STATE), we only allow<br>
> > > > fast-clears on<br>
> > > > > + * the first slice (level 0, layer 0). At the time of our testing<br>
> > (Jan<br>
> > > > 17,<br>
> > > > > + * 2018), there were no known applications which would benefit from<br>
> > > > fast-<br>
> > > > > + * clearing more than just the first slice.<br>
> > > > > *<br>
> > > > > - * With the above scheme, we can fast-clear whenever the hardware<br>
> > > > allows except<br>
> > > > > - * for two cases in which synchronization becomes impossible or<br>
> > > > undesirable:<br>
> > > > > - * * The subresource is in the GENERAL layout and is cleared to a<br>
> > > > value<br>
> > > > > - * other than the special default value.<br>
> > > > > + * The fast clear portion of the image is laid out in the following<br>
> > > > order:<br>
> > > > > *<br>
> > > > > - * Performing a synchronization operation in order to read<br>
> > from the<br>
> > > > > - * subresource is undesirable in this case. Firstly, b) is not<br>
> > an<br>
> > > > option<br>
> > > > > - * because a layout transition isn't required between a write<br>
> > and<br>
> > > > read of<br>
> > > > > - * an image in the GENERAL layout. Secondly, it's undesirable<br>
> > to<br>
> > > > do a)<br>
> > > > > - * explicitly because it would require large infrastructural<br>
> > > > changes. The<br>
> > > > > - * Vulkan API supports us in deciding not to optimize this<br>
> > layout<br>
> > > > by<br>
> > > > > - * stating that using this layout may cause suboptimal<br>
> > > > performance. NOTE:<br>
> > > > > - * the auxiliary buffer must always be enabled to support a)<br>
> > > > implicitly.<br>
> > > > > + * * 1 or 4 dwords (depending on hardware generation) for the clear<br>
> > > > color<br>
> > > > > + * * 1 dword for the anv_fast_clear_type of the clear color<br>
> > > > > + * * On gen9+, 1 dword per level and layer of the image (3D levels<br>
> > > > count<br>
> > > > > + * multiple layers) in level-major order for compression state.<br>
> > > > > *<br>
> > > > > + * For the purpose of discoverability, the algorithm used to manage<br>
> > > > > + * compression and fast-clears is described here:<br>
> > > > > *<br>
> > > > > - * * For the given miplevel, only some of the layers are cleared<br>
> > at<br>
> > > > once.<br>
> > > > > + * * On a transition from UNDEFINED or PREINITIALIZED to a defined<br>
> > > > layout,<br>
> > > > > + * all of the values in the fast clear portion of the image are<br>
> > > > initialized<br>
> > > > > + * to default values.<br>
> > > > > *<br>
> > > > > - * If the user clears each layer to a different value, then<br>
> > tries<br>
> > > > to<br>
> > > > > - * render to multiple layers at once, we have no ability to<br>
> > > > perform a<br>
> > > > > - * synchronization operation in between. a) is not helpful<br>
> > because<br>
> > > > the<br>
> > > > > - * object can only hold one clear value. b) is not an option<br>
> > > > because a<br>
> > > > > - * layout transition isn't required in this case.<br>
> > > > > + * * On fast-clear, the clear value is written into surface state<br>
> > and<br>
> > > > also<br>
> > > > > + * into the buffer and the fast clear type is set appropriately.<br>
> > > > Both<br>
> > > > > + * setting the fast-clear value in the buffer and setting the<br>
> > > > fast-clear<br>
> > > > > + * type happen from the GPU using MI commands.<br>
> > > ><br>
> > > > There's no mention about setting the compression dwords during<br>
> > > > fast-clear operations or on render target writes.<br>
> > > ><br>
> > ><br>
> > > I've added another bullet:<br>
> > ><br>
> > > * * Whenever a render or blorp operation is performed with CCS_E, we<br>
> > call<br>
> > > * anv_cmd_buffer_mark_image_<wbr>written to set the compression state to<br>
> > 1.<br>
> > ><br>
> > ><br>
> ><br>
> > That'll work. I think we should replace<br>
> > anv_cmd_buffer_mark_image_<wbr>written with<br>
> > genX(cmd_buffer_mark_image_<wbr>written) though, since the former's a wrapper<br>
> > that only gets called for blorp operations.<br>
> ><br>
><br>
> Done.<br>
><br>
><br>
> > > > > + *<br>
> > > > > + * * On pipeline barrier transitions, the worst-case transition is<br>
> > > > computed<br>
> > > > > + * from the image layouts. The command streamer inspects the<br>
> > fast<br>
> > > > clear<br>
> > > > > + * type and compression state dwords and constructs a<br>
> > predicate. The<br>
> > > > > + * worst-case resolve is performed with the given predicate and<br>
> > the<br>
> > > > fast<br>
> > > > > + * clear and compression state is set accordingly.<br>
> > > > > + *<br>
> > > > > + * See anv_layout_to_aux_usage and anv_layout_to_fast_clear_type<br>
> > > > functions for<br>
> > > > > + * details on exactly what is allowed in what layouts.<br>
> > > > > + *<br>
> > > > > + * On gen7-9, we do not have a concept of indirect clear colors in<br>
> > > > hardware.<br>
> > > > > + * In order to deal with this, we have to do some clear color<br>
> > > > management.<br>
> > > > > + *<br>
> > > > > + * * For LOAD_OP_LOAD at the top of a renderpass, we have to copy<br>
> > the<br>
> > > > clear<br>
> > > > > + * value from the buffer into the surface state with MI commands.<br>
> > > > > + *<br>
> > > > > + * * For any blorp operations, we pass the address to the clear<br>
> > value<br>
> > > > into<br>
> > > > > + * blorp and it knows to copy the clear color.<br>
> > > > > */<br>
> > > > > static void<br>
> > > > > -add_fast_clear_state_buffer(<wbr>struct anv_image *image,<br>
> > > > > - VkImageAspectFlagBits aspect,<br>
> > > > > - uint32_t plane,<br>
> > > > > - const struct anv_device *device)<br>
> > > > > +add_aux_state_tracking_<wbr>buffer(struct anv_image *image,<br>
> > > > > + VkImageAspectFlagBits aspect,<br>
> > > > > + uint32_t plane,<br>
> > > > > + const struct anv_device *device)<br>
> > > ><br>
> > > > The comment referencing this function in genX_cmd_buffer.c needs to be<br>
> > > > updated.<br>
> > > ><br>
> > ><br>
> > > I don't see that comment anywhere.<br>
> > ><br>
> > ><br>
> ><br>
> > Maybe you delete it in a future series? This is the one I have in mind:<br>
> ><br>
> > /* We only allow fast clears in the GENERAL layout if the auxiliary<br>
> > * buffer is always enabled and the fast-clear value is all 0's. See<br>
> > * add_fast_clear_state_buffer() for more information.<br>
> > */<br>
> ><br>
><br>
> Right. Somehow my grepping missed it. Fixed.<br>
><br>
><br>
> > > > > {<br>
> > > > > assert(image && device);<br>
> > > > > assert(image->planes[plane].<wbr>aux_surface.isl.size > 0 &&<br>
> > > > > @@ -251,20 +259,24 @@ add_fast_clear_state_buffer(<wbr>struct anv_image<br>
> > > > *image,<br>
> > > > > (image->planes[plane].offset +<br>
> > image->planes[plane].size));<br>
> > > > > }<br>
> > > > ><br>
> > > > > - const unsigned entry_size = anv_fast_clear_state_entry_<br>
> > size(device);<br>
> > > > > - /* There's no padding between entries, so ensure that they're<br>
> > always<br>
> > > > a<br>
> > > > > - * multiple of 32 bits in order to enable GPU memcpy operations.<br>
> > > > > - */<br>
> > > > > - assert(entry_size % 4 == 0);<br>
> > > > > + /* Clear color and fast clear type */<br>
> > > > > + unsigned state_size = device->isl_dev.ss.clear_<wbr>value_size + 4;<br>
> > > > ><br>
> > > > > - const unsigned plane_state_size =<br>
> > > > > - entry_size * anv_image_aux_levels(image, aspect);<br>
> > > > > + /* We only need to track compression on CCS_E surfaces. */<br>
> > > > > + if (image->planes[plane].aux_<wbr>usage == ISL_AUX_USAGE_CCS_E) {<br>
> > > > > + if (image->type == VK_IMAGE_TYPE_3D) {<br>
> > > > > + for (uint32_t l = 0; l < image->levels; l++)<br>
> > > > > + state_size += anv_minify(image->extent.<wbr>depth, l) * 4;<br>
> > > > > + } else {<br>
> > > > > + state_size += image->levels * image->array_size * 4;<br>
> > > > > + }<br>
> > > > > + }<br>
> > > > ><br>
> > > > > image->planes[plane].fast_<wbr>clear_state_offset =<br>
> > > > > image->planes[plane].offset + image->planes[plane].size;<br>
> > > > ><br>
> > > > > - image->planes[plane].size += plane_state_size;<br>
> > > > > - image->size += plane_state_size;<br>
> > > > > + image->planes[plane].size += state_size;<br>
> > > > > + image->size += state_size;<br>
> > > > > }<br>
> > > > ><br>
> > > > > /**<br>
> > > > > @@ -437,7 +449,7 @@ make_surface(const struct anv_device *dev,<br>
> > > > > }<br>
> > > > ><br>
> > > > > add_surface(image, &image->planes[plane].aux_<wbr>surface,<br>
> > > > plane);<br>
> > > > > - add_fast_clear_state_buffer(<wbr>image, aspect, plane, dev);<br>
> > > > > + add_aux_state_tracking_buffer(<wbr>image, aspect, plane,<br>
> > dev);<br>
> > > > ><br>
> > > > > /* For images created without MUTABLE_FORMAT_BIT set, we<br>
> > > > know that<br>
> > > > > * they will always be used with the original format.<br>
> > In<br>
> > > > > @@ -461,7 +473,7 @@ make_surface(const struct anv_device *dev,<br>
> > > > > &image->planes[plane].aux_<br>
> > > > surface.isl);<br>
> > > > > if (ok) {<br>
> > > > > add_surface(image, &image->planes[plane].aux_<wbr>surface,<br>
> > plane);<br>
> > > > > - add_fast_clear_state_buffer(<wbr>image, aspect, plane, dev);<br>
> > > > > + add_aux_state_tracking_buffer(<wbr>image, aspect, plane, dev);<br>
> > > > > image->planes[plane].aux_usage = ISL_AUX_USAGE_MCS;<br>
> > > > > }<br>
> > > > > }<br>
> > > > > diff --git a/src/intel/vulkan/anv_<wbr>private.h b/src/intel/vulkan/anv_<br>
> > > > private.h<br>
> > > > > index 5f82702..d38dd9e 100644<br>
> > > > > --- a/src/intel/vulkan/anv_<wbr>private.h<br>
> > > > > +++ b/src/intel/vulkan/anv_<wbr>private.h<br>
> > > > > @@ -2533,50 +2533,58 @@ anv_image_aux_layers(const struct anv_image *<br>
> > > > const image,<br>
> > > > > }<br>
> > > > > }<br>
> > > > ><br>
> > > > > -static inline unsigned<br>
> > > > > -anv_fast_clear_state_entry_<wbr>size(const struct anv_device *device)<br>
> > > > > -{<br>
> > > > > - assert(device);<br>
> > > > > - /* Entry contents:<br>
> > > > > - * +-----------------------------<wbr>---------------+<br>
> > > > > - * | clear value dword(s) | needs resolve dword |<br>
> > > > > - * +-----------------------------<wbr>---------------+<br>
> > > > > - */<br>
> > > > > -<br>
> > > > > - /* Ensure that the needs resolve dword is in fact dword-aligned<br>
> > to<br>
> > > > enable<br>
> > > > > - * GPU memcpy operations.<br>
> > > > > - */<br>
> > > > > - assert(device->isl_dev.ss.<wbr>clear_value_size % 4 == 0);<br>
> > > > > - return device->isl_dev.ss.clear_<wbr>value_size + 4;<br>
> > > > > -}<br>
> > > > > -<br>
> > > > > static inline struct anv_address<br>
> > > > > anv_image_get_clear_color_<wbr>addr(const struct anv_device *device,<br>
> > > > > const struct anv_image *image,<br>
> > > > > - VkImageAspectFlagBits aspect,<br>
> > > > > - unsigned level)<br>
> > > > > + VkImageAspectFlagBits aspect)<br>
> > > > > {<br>
> > > > > + assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> > > > > +<br>
> > > > > uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects,<br>
> > aspect);<br>
> > > > > return (struct anv_address) {<br>
> > > > > .bo = image->planes[plane].bo,<br>
> > > > > .offset = image->planes[plane].bo_offset +<br>
> > > > > - image->planes[plane].fast_<wbr>clear_state_offset +<br>
> > > > > - anv_fast_clear_state_entry_<wbr>size(device) * level,<br>
> > > > > + image->planes[plane].fast_<wbr>clear_state_offset,<br>
> > > > > };<br>
> > > > > }<br>
> > > > ><br>
> > > > > static inline struct anv_address<br>
> > > > > -anv_image_get_needs_resolve_<wbr>addr(const struct anv_device *device,<br>
> > > > > - const struct anv_image *image,<br>
> > > > > - VkImageAspectFlagBits aspect,<br>
> > > > > - unsigned level)<br>
> > > > > +anv_image_get_fast_clear_<wbr>type_addr(const struct anv_device *device,<br>
> > > > > + const struct anv_image *image,<br>
> > > > > + VkImageAspectFlagBits aspect)<br>
> > > > > {<br>
> > > > > struct anv_address addr =<br>
> > > > > - anv_image_get_clear_color_<wbr>addr(device, image, aspect, level);<br>
> > > > > + anv_image_get_clear_color_<wbr>addr(device, image, aspect);<br>
> > > > > addr.offset += device->isl_dev.ss.clear_<wbr>value_size;<br>
> > > > > return addr;<br>
> > > > > }<br>
> > > > ><br>
> > > > > +static inline struct anv_address<br>
> > > > > +anv_image_get_compression_<wbr>state_addr(const struct anv_device<br>
> > *device,<br>
> > > > > + const struct anv_image *image,<br>
> > > > > + VkImageAspectFlagBits aspect,<br>
> > > > > + uint32_t level, uint32_t<br>
> > > > array_layer)<br>
> > > > > +{<br>
> > > > > + assert(level < anv_image_aux_levels(image, aspect));<br>
> > > > > + assert(array_layer < anv_image_aux_layers(image, aspect, level));<br>
> > > > > + UNUSED uint32_t plane = anv_image_aspect_to_plane(<br>
> > image->aspects,<br>
> > > > aspect);<br>
> > > > > + assert(image->planes[plane].<wbr>aux_usage == ISL_AUX_USAGE_CCS_E);<br>
> > > > > +<br>
> > > > > + struct anv_address addr =<br>
> > > > > + anv_image_get_fast_clear_type_<wbr>addr(device, image, aspect);<br>
> > > > > + addr.offset += 4; /* Go past the fast clear type */<br>
> > > > > +<br>
> > > > > + if (image->type == VK_IMAGE_TYPE_3D) {<br>
> > > > > + for (uint32_t l = 0; l < level; l++)<br>
> > > > > + addr.offset += anv_minify(image->extent.<wbr>depth, l) * 4;<br>
> > > > > + } else {<br>
> > > > > + addr.offset += level * image->array_size * 4;<br>
> > > > > + }<br>
> > > > > + addr.offset += array_layer * 4;<br>
> > > > > +<br>
> > > > > + return addr;<br>
> > > > > +}<br>
> > > > > +<br>
> > > > > /* Returns true if a HiZ-enabled depth buffer can be sampled from.<br>
> > */<br>
> > > > > static inline bool<br>
> > > > > anv_can_sample_with_hiz(const struct gen_device_info * const<br>
> > devinfo,<br>
> > > > > diff --git a/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> > > > b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> > > > > index e1a4d95..4c75e0c 100644<br>
> > > > > --- a/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> > > > > +++ b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> > > > > @@ -407,27 +407,45 @@ transition_depth_buffer(struct anv_cmd_buffer<br>
> > > > *cmd_buffer,<br>
> > > > > #define MI_PREDICATE_SRC0 0x2400<br>
> > > > > #define MI_PREDICATE_SRC1 0x2408<br>
> > > > ><br>
> > > > > -/* Manages the state of an color image subresource to ensure<br>
> > resolves<br>
> > > > are<br>
> > > > > - * performed properly.<br>
> > > > > - */<br>
> > > > > static void<br>
> > > > > -genX(set_image_needs_resolve)<wbr>(struct anv_cmd_buffer *cmd_buffer,<br>
> > > > > - const struct anv_image *image,<br>
> > > > > - VkImageAspectFlagBits aspect,<br>
> > > > > - unsigned level, bool needs_resolve)<br>
> > > > > +set_image_fast_clear_state(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
> > > > > + const struct anv_image *image,<br>
> > > > > + VkImageAspectFlagBits aspect,<br>
> > > > > + enum anv_fast_clear_type fast_clear)<br>
> > > > > {<br>
> > > > > - assert(cmd_buffer && image);<br>
> > > > > - assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> > > > > - assert(level < anv_image_aux_levels(image, aspect));<br>
> > > > > -<br>
> > > > > - /* The HW docs say that there is no way to guarantee the<br>
> > completion<br>
> > > > of<br>
> > > > > - * the following command. We use it nevertheless because it<br>
> > shows no<br>
> > > > > - * issues in testing is currently being used in the GL driver.<br>
> > > > > - */<br>
> > > > > anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM),<br>
> > sdi) {<br>
> > > > > - sdi.Address = anv_image_get_needs_resolve_<br>
> > > > addr(cmd_buffer->device,<br>
> > > > > - image, aspect,<br>
> > > > level);<br>
> > > > > - sdi.ImmediateData = needs_resolve;<br>
> > > > > + sdi.Address = anv_image_get_fast_clear_type_<br>
> > > > addr(cmd_buffer->device,<br>
> > > > > + image,<br>
> > aspect);<br>
> > > > > + sdi.ImmediateData = fast_clear;<br>
> > > > > + }<br>
> > > > > +}<br>
> > > > > +<br>
> > > > > +static void<br>
> > > > > +set_image_compressed_bit(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
> > > > > + const struct anv_image *image,<br>
> > > > > + VkImageAspectFlagBits aspect,<br>
> > > > > + uint32_t level,<br>
> > > > > + uint32_t base_layer, uint32_t layer_count,<br>
> > > > > + bool compressed)<br>
> > > > > +{<br>
> > > > > + /* We only have CCS_E on gen9+ */<br>
> > > > > + if (GEN_GEN < 9)<br>
> > > > > + return;<br>
> > > > > +<br>
> > > ><br>
> > > > I'm also fine with dropping this if statement.<br>
> > > ><br>
> > ><br>
> > > Ok, we'll drop it.<br>
> > ><br>
> > ><br>
> > > > > + uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects,<br>
> > aspect);<br>
> > > > > +<br>
> > > > > + /* We only have compression tracking for CCS_E */<br>
> > > > > + if (image->planes[plane].aux_<wbr>usage != ISL_AUX_USAGE_CCS_E)<br>
> > > > > + return;<br>
> > > > > +<br>
> > > > > + for (uint32_t a = 0; a < layer_count; a++) {<br>
> > > > > + uint32_t layer = base_layer + a;<br>
> > > > > + anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM),<br>
> > sdi)<br>
> > > > {<br>
> > > > > + sdi.Address = anv_image_get_compression_<br>
> > > > state_addr(cmd_buffer->device,<br>
> > > > > + image,<br>
> > > > aspect,<br>
> > > > > + level,<br>
> > > > layer);<br>
> > > > > + sdi.ImmediateData = compressed ? UINT32_MAX : 0;<br>
> > > > > + }<br>
> > > > > }<br>
> > > > > }<br>
> > > > ><br>
> > > > > @@ -451,32 +469,176 @@ mi_alu(uint32_t opcode, uint32_t operand1,<br>
> > > > uint32_t operand2)<br>
> > > > > #define CS_GPR(n) (0x2600 + (n) * 8)<br>
> > > > ><br>
> > > > > static void<br>
> > > > > -genX(load_needs_resolve_<wbr>predicate)(struct anv_cmd_buffer<br>
> > *cmd_buffer,<br>
> > > > > - const struct anv_image *image,<br>
> > > > > - VkImageAspectFlagBits aspect,<br>
> > > > > - unsigned level)<br>
> > > > > +anv_cmd_predicated_ccs_<wbr>resolve(struct anv_cmd_buffer *cmd_buffer,<br>
> > > > > + const struct anv_image *image,<br>
> > > > > + VkImageAspectFlagBits aspect,<br>
> > > > > + uint32_t level, uint32_t array_layer,<br>
> > > > > + enum isl_aux_op resolve_op,<br>
> > > > > + enum anv_fast_clear_type<br>
> > > > fast_clear_supported)<br>
> > > > > {<br>
> > > > > - assert(cmd_buffer && image);<br>
> > > > > - assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> > > > > - assert(level < anv_image_aux_levels(image, aspect));<br>
> > > > > + struct anv_address fast_clear_type_addr =<br>
> > > > > + anv_image_get_fast_clear_type_<wbr>addr(cmd_buffer->device, image,<br>
> > > > aspect);<br>
> > > > > +<br>
> > > > > +#if GEN_GEN >= 9<br>
> > > > > + const uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects,<br>
> > > > aspect);<br>
> > > > > + const bool decompress =<br>
> > > > > + resolve_op == ISL_AUX_OP_FULL_RESOLVE &&<br>
> > > > > + image->planes[plane].aux_usage == ISL_AUX_USAGE_CCS_E;<br>
> > > > > +<br>
> > > > > + /* This function shouldn't get called if it isn't going to do<br>
> > > > anything */<br>
> > > > > + assert(decompress || fast_clear_supported < ANV_FAST_CLEAR_ANY);<br>
> > > > > +<br>
> > > > > + if (level == 0 && array_layer == 0) {<br>
> > > > > + /* This is the complex case because we have to worry about<br>
> > > > dealing with<br>
> > > > > + * the fast clear color. Unfortunately, it's also the common<br>
> > > > case.<br>
> > > > > + */<br>
> > > > > +<br>
> > > > > + /* Poor-man's register allocation */<br>
> > > > > + int next_reg = MI_ALU_REG0;<br>
> > > > > + int pred_reg = -1;<br>
> > > > > +<br>
> > > > > + /* Needed for ALU operations */<br>
> > > > > + uint32_t *dw;<br>
> > > > > +<br>
> > > > > + const int image_fc = next_reg++;<br>
> > > > > + anv_batch_emit(&cmd_buffer-><wbr>batch,<br>
> > GENX(MI_LOAD_REGISTER_MEM),<br>
> > > > lrm) {<br>
> > > > > + lrm.RegisterAddress = CS_GPR(image_fc);<br>
> > > > > + lrm.MemoryAddress = fast_clear_type_addr;<br>
> > > > > + }<br>
> > > > > + emit_lri(&cmd_buffer->batch, CS_GPR(image_fc) + 4, 0);<br>
> > > > > +<br>
> > > > > + if (fast_clear_supported < ANV_FAST_CLEAR_ANY) {<br>
> > > > > + /* We need to compute (fast_clear_supported <<br>
> > > > image->fast_clear).<br>
> > > > > + * We do this by subtracting and storing the carry bit.<br>
> > > > > + */<br>
> > > > > + const int fc_imm = next_reg++;<br>
> > > > > + emit_lri(&cmd_buffer->batch, CS_GPR(fc_imm),<br>
> > > > fast_clear_supported);<br>
> > > > > + emit_lri(&cmd_buffer->batch, CS_GPR(fc_imm) + 4, 0);<br>
> > > > > +<br>
> > > > > + assert(pred_reg == -1);<br>
> > > > > + pred_reg = next_reg++;<br>
> > > > > +<br>
> > > > > + dw = anv_batch_emitn(&cmd_buffer-><wbr>batch, 5,<br>
> > GENX(MI_MATH));<br>
> > > > > + dw[1] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCA, fc_imm);<br>
> > > > > + dw[2] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCB, image_fc);<br>
> > > > > + dw[3] = mi_alu(MI_ALU_SUB, 0, 0);<br>
> > > > > + dw[4] = mi_alu(MI_ALU_STORE, pred_reg, MI_ALU_CF);<br>
> > > > > + }<br>
> > > > > +<br>
> > > > > + if (decompress) {<br>
> > > > > + /* If we're doing a full resolve, we need the compression<br>
> > > > state */<br>
> > > > > + struct anv_address compression_state_addr =<br>
> > > > > + anv_image_get_compression_<br>
> > state_addr(cmd_buffer->device,<br>
> > > > image,<br>
> > > > > + aspect, level,<br>
> > > > array_layer);<br>
> > > > > + if (pred_reg == -1) {<br>
> > > > > + pred_reg = next_reg++;<br>
> > > > > + anv_batch_emit(&cmd_buffer-><wbr>batch,<br>
> > > > GENX(MI_LOAD_REGISTER_MEM), lrm) {<br>
> > > > > + lrm.RegisterAddress = CS_GPR(pred_reg);<br>
> > > > > + lrm.MemoryAddress = compression_state_addr;<br>
> > > > > + }<br>
> > > ><br>
> > ><br>
> > > In this case, we don't have a fast-clear predicate. This is because "if<br>
> > > (fast_clear_supported < ANV_FAST_CLEAR_ANY)" failed so we didn't compute<br>
> > > one. In this case, we treat the fast-clear predicate as being 0 and OR<br>
> > > with 0 is just the original value.<br>
> > ><br>
> > ><br>
> > ><br>
> > > > > + } else {<br>
> > > > > + /* OR the compression state into the predicate. The<br>
> > > > compression<br>
> > > > > + * state is already in 0/~0 form.<br>
> > > > > + */<br>
> > > ><br>
> > > > The OR is a result of this function's structure and wouldn't be needed<br>
> > > > if the control flow in this function was different right?<br>
> > > ><br>
> > ><br>
> > > In this case, we have both a fast-clear predicate and a compression<br>
> > > predicate. We have to OR them together.<br>
> > ><br>
> > > I'm not sure what you mean about control-flow in this function being<br>
> > > different. Yes, I'm probably over-optimizing a bit to avoid unneeded CS<br>
> > > operations.<br>
> > ><br>
> > ><br>
> ><br>
> > By that, I'm referring to the organization of the if's and else's in<br>
> > this function - though that still isn't very exact/clear. Sorry for the<br>
> > confusion.<br>
> ><br>
> > It seems like the OR is one more CS operation than neeeded. It's always<br>
> > the case that if we have some fast-clear data, the compression state<br>
> > will always be true.<br>
> ><br>
><br>
> I'm not sure if that's always true or not. I think it is in practice since<br>
> we only fast clear at the top of render passes and we always call<br>
> mark_image_written at the top of the render pass too. However, if the<br>
> first use of a fast-cleared image is an input attachment, this may not be<br>
> the case. Also, if we ever try to do something clever like you suggested<br>
> with PS depth count, this won't be the case. In GL, it's easy because all<br>
> you have to do is a fast-clear and some rendering as sRGB and you get into<br>
> the PARTIAL_CLEAR state. In any case, I don't know that we want them to be<br>
> coupled.<br>
><br>
><br>
<br>
</div></div>Clearing input attachments is not a problem because the attachment must<br>
also be used as a color attachment in the same subpass:<br>
<br>
If the first use of an attachment in this render pass is as an input<br>
attachment, and the attachment is not also used as a color or<br>
depth/stencil attachment in the same subpass, then loadOp must not be<br>
VK_ATTACHMENT_LOAD_OP_CLEAR<br>
<br>
Regardless of what we do with mark_image_written (use PS_DEPTH_COUNT,<br>
call it during vkDraw* commands), we could update the compression state<br>
as part of updating the fast clear state. What benefit would we get from<br>
allowing FC to be a non-zero value and COMP to be zero?<br><div><div class="h5"></div></div></blockquote><div><br></div><div>It's useful in GL because it lets us avoid any sort of resolve when going from CCS_E to CCS_D. In Vulkan, maybe it's not useful.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
> > Here's how I understand the states could look:<br>
> > FC | COMP<br>
> > ==========<br>
> > 0 | 0<br>
> > 0 | 1<br>
> > 1 | 1<br>
> > 2 | 1<br>
> ><br>
> > If that's the case we only have to check the compression_state when a<br>
> > decompress is called for. If we moved the "if (decompress) {}" block of<br>
> > the function higher we wouldn't need to OR the compression predicate<br>
> > with the fast clear predicate.<br>
> ><br>
><br>
> Ugh... This is subtle... If the above table is accurate (which, as I said<br>
> above, I'm not convinced it is), then I think your statement there is<br>
> correct.<br>
><br>
> I'm a bit torn. On the one hand, this code is complicated and we could<br>
> simplify it if we're willing to make the assumption above. On the other<br>
> hand, there are a bunch of subtle interactions here and I don't like subtle.<br></div></div></blockquote><div><br></div><div>As much as I hate to admit it, I think you may be starting to convince me. :-) As long as setting fast clear state also sets compressed state (so they never get out-of-sync), I think we can go ahead and make that assumption. I'll see how it simplifies the code.<br><br></div><div>--Jason<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
> > > > > + const int image_comp = next_reg++;<br>
> > > > > + anv_batch_emit(&cmd_buffer-><wbr>batch,<br>
> > > > GENX(MI_LOAD_REGISTER_MEM), lrm) {<br>
> > > > > + lrm.RegisterAddress = CS_GPR(image_comp);<br>
> > > > > + lrm.MemoryAddress = compression_state_addr;<br>
> > > > > + }<br>
> > > > > +<br>
> > > > > + dw = anv_batch_emitn(&cmd_buffer-><wbr>batch, 5,<br>
> > GENX(MI_MATH));<br>
> > > > > + dw[1] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCA, pred_reg);<br>
> > > > > + dw[2] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCB, image_comp);<br>
> > > > > + dw[3] = mi_alu(MI_ALU_OR, 0, 0);<br>
> > > > > + dw[4] = mi_alu(MI_ALU_STORE, pred_reg, MI_ALU_ACCU);<br>
> > > > > + }<br>
> > > > > +<br>
> > > > > + anv_batch_emit(&cmd_buffer-><wbr>batch,<br>
> > GENX(MI_STORE_DATA_IMM),<br>
> > > > sdi) {<br>
> > > > > + sdi.Address = compression_state_addr;<br>
> > > > > + sdi.ImmediateData = 0;<br>
> > > > > + }<br>
> > > > > + }<br>
> > > > ><br>
> > > > > - const struct anv_address resolve_flag_addr =<br>
> > > > > - anv_image_get_needs_resolve_<wbr>addr(cmd_buffer->device,<br>
> > > > > - image, aspect, level);<br>
> > > > > + /* Store the predicate */<br>
> > > > > + assert(pred_reg != -1);<br>
> > > > > + emit_lrr(&cmd_buffer->batch, MI_PREDICATE_SRC0,<br>
> > CS_GPR(pred_reg));<br>
> > > > ><br>
> > > > > - /* Make the pending predicated resolve a no-op if one is not<br>
> > needed.<br>
> > > > > - * predicate = do_resolve = resolve_flag != 0;<br>
> > > > > + /* If the predicate is true, we want to write 0 to the fast<br>
> > clear<br>
> > > > type<br>
> > > > > + * and, if it's false, leave it alone. We can do this by<br>
> > writing<br>
> > > > > + *<br>
> > > > > + * clear_type = clear_type & ~predicate;<br>
> > > > > + */<br>
> > > > > + dw = anv_batch_emitn(&cmd_buffer-><wbr>batch, 5, GENX(MI_MATH));<br>
> > > > > + dw[1] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCA, image_fc);<br>
> > > > > + dw[2] = mi_alu(MI_ALU_LOADINV, MI_ALU_SRCB, pred_reg);<br>
> > > > > + dw[3] = mi_alu(MI_ALU_AND, 0, 0);<br>
> > > > > + dw[4] = mi_alu(MI_ALU_STORE, image_fc, MI_ALU_ACCU);<br>
> > > > > +<br>
> > > > > + anv_batch_emit(&cmd_buffer-><wbr>batch,<br>
> > GENX(MI_STORE_REGISTER_MEM),<br>
> > > > srm) {<br>
> > > > > + srm.RegisterAddress = CS_GPR(image_fc);<br>
> > > > > + srm.MemoryAddress = fast_clear_type_addr;<br>
> > > > > + }<br>
> > > > > + } else if (decompress) {<br>
> > > > > + /* We're trying to get rid of compression but we don't care<br>
> > about<br>
> > > > fast<br>
> > > > > + * clears so all we need is the compression predicate.<br>
> > > > > + */<br>
> > > > > + assert(resolve_op == ISL_AUX_OP_FULL_RESOLVE);<br>
> > > > > + struct anv_address compression_state_addr =<br>
> > > > > + anv_image_get_compression_<wbr>state_addr(cmd_buffer->device,<br>
> > > > image,<br>
> > > > > + aspect, level,<br>
> > > > array_layer);<br>
> > > > > + anv_batch_emit(&cmd_buffer-><wbr>batch,<br>
> > GENX(MI_LOAD_REGISTER_MEM),<br>
> > > > lrm) {<br>
> > > > > + lrm.RegisterAddress = MI_PREDICATE_SRC0;<br>
> > > > > + lrm.MemoryAddress = compression_state_addr;<br>
> > > > > + }<br>
> > > > > + anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM),<br>
> > sdi)<br>
> > > > {<br>
> > > > > + sdi.Address = compression_state_addr;<br>
> > > > > + sdi.ImmediateData = 0;<br>
> > > > > + }<br>
> ><br>
><br>
> If we want them decoupled, we probably want to do "compressed |= predicate"<br>
> here.<br>
><br>
><br>
> > > > > + } else {<br>
> > > > > + /* In this case, we're trying to do a partial resolve on a<br>
> > slice<br>
> > > > that<br>
> > > > > + * doesn't have clear color. There's nothing to do.<br>
> > > > > + */<br>
> > > > > + return;<br>
> > > > > + }<br>
> > > > > +<br>
> > > > > +#else /* GEN_GEN <= 8 */<br>
> > > > > + assert(resolve_op == ISL_AUX_OP_FULL_RESOLVE);<br>
> > > > > + assert(fast_clear_supported != ANV_FAST_CLEAR_ANY);<br>
> > > > > +<br>
> > > > > + /* We don't support fast clears on anything other than the first<br>
> > > > slice. */<br>
> > > > > + if (level > 0 || array_layer > 0)<br>
> > > > > + return;<br>
> > > > > +<br>
> > > > > + /* On gen8, we don't have a concept of default clear colors<br>
> > because<br>
> > > > we<br>
> > > > > + * can't sample from CCS surfaces. It's enough to just load the<br>
> > > > fast clear<br>
> > > > > + * state into the predicate register.<br>
> > > > > */<br>
> > > > > + anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_LOAD_REGISTER_MEM),<br>
> > lrm)<br>
> > > > {<br>
> > > > > + lrm.RegisterAddress = MI_PREDICATE_SRC0;<br>
> > > > > + lrm.MemoryAddress = fast_clear_type_addr;<br>
> > > > > + }<br>
> > > > > + anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM),<br>
> > sdi) {<br>
> > > > > + sdi.Address = fast_clear_type_addr;<br>
> > > > > + sdi.ImmediateData = 0;<br>
> > > > > + }<br>
> > > > > +#endif<br>
> > > > > +<br>
> > > > > + /* We use the first half of src0 for the actual predicate. Set<br>
> > the<br>
> > > > second<br>
> > > > > + * half of src0 and all of src1 to 0 as the predicate operation<br>
> > will<br>
> > > > be<br>
> > > > > + * doing an implicit src0 != src1.<br>
> > > > > + */<br>
> > > > > + emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4, 0);<br>
> > > > > emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1 , 0);<br>
> > > > > emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1 + 4, 0);<br>
> > > > > - emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0 , 0);<br>
> > > > > - emit_lrm(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4,<br>
> > > > > - <a href="http://resolve_flag_addr.bo" rel="noreferrer" target="_blank">resolve_flag_addr.bo</a>, resolve_flag_addr.offset);<br>
> > > > > +<br>
> > > > > anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_PREDICATE), mip) {<br>
> > > > > mip.LoadOperation = LOAD_LOADINV;<br>
> > > > > mip.CombineOperation = COMBINE_SET;<br>
> > > > > mip.CompareOperation = COMPARE_SRCS_EQUAL;<br>
> > > > > }<br>
> > > > > +<br>
> > > > > + anv_image_ccs_op(cmd_buffer, image, aspect, level,<br>
> > > > > + array_layer, 1, resolve_op, true);<br>
> > > > > }<br>
> > > > ><br>
> > > > > void<br>
> > > > > @@ -490,17 +652,30 @@ genX(cmd_buffer_mark_image_<wbr>written)(struct<br>
> > > > anv_cmd_buffer *cmd_buffer,<br>
> > > > > {<br>
> > > > > /* The aspect must be exactly one of the image aspects. */<br>
> > > > > assert(_mesa_bitcount(aspect) == 1 && (aspect & image->aspects));<br>
> > > > > +<br>
> > > > > + /* The only compression types with more than just fast-clears are<br>
> > > > MCS,<br>
> > > > > + * CCS_E, and HiZ. With HiZ we just trust the layout and don't<br>
> > > > actually<br>
> > > > > + * track the current fast-clear and compression state. This<br>
> > leaves<br>
> > > > us<br>
> > > > > + * with just MCS and CCS_E.<br>
> > > > > + */<br>
> > > > > + if (aux_usage != ISL_AUX_USAGE_CCS_E &&<br>
> > > > > + aux_usage != ISL_AUX_USAGE_MCS)<br>
> > > > > + return;<br>
> > > > > +<br>
> > > > > + set_image_compressed_bit(cmd_<wbr>buffer, image, aspect,<br>
> > > > > + level, base_layer, layer_count, true);<br>
> > > > > }<br>
> > > > ><br>
> > > > > static void<br>
> > > > > -init_fast_clear_state_entry(<wbr>struct anv_cmd_buffer *cmd_buffer,<br>
> > > > > - const struct anv_image *image,<br>
> > > > > - VkImageAspectFlagBits aspect,<br>
> > > > > - unsigned level)<br>
> > > > > +init_fast_clear_color(struct anv_cmd_buffer *cmd_buffer,<br>
> > > > > + const struct anv_image *image,<br>
> > > > > + VkImageAspectFlagBits aspect)<br>
> > > > > {<br>
> > > > > assert(cmd_buffer && image);<br>
> > > > > assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> > > > > - assert(level < anv_image_aux_levels(image, aspect));<br>
> > > > > +<br>
> > > > > + set_image_fast_clear_state(<wbr>cmd_buffer, image, aspect,<br>
> > > > > + ANV_FAST_CLEAR_NONE);<br>
> > > > ><br>
> > > > > uint32_t plane = anv_image_aspect_to_plane(<wbr>image->aspects,<br>
> > aspect);<br>
> > > > > enum isl_aux_usage aux_usage = image->planes[plane].aux_<wbr>usage;<br>
> > > > > @@ -517,7 +692,7 @@ init_fast_clear_state_entry(<wbr>struct<br>
> > anv_cmd_buffer<br>
> > > > *cmd_buffer,<br>
> > > > > * values in the clear value dword(s).<br>
> > > > > */<br>
> > > > > struct anv_address addr =<br>
> > > > > - anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image,<br>
> > > > aspect, level);<br>
> > > > > + anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image,<br>
> > > > aspect);<br>
> > > > > unsigned i = 0;<br>
> > > > > for (; i < cmd_buffer->device->isl_dev.<wbr>ss.clear_value_size; i<br>
> > += 4)<br>
> > > > {<br>
> > > > > anv_batch_emit(&cmd_buffer-><wbr>batch, GENX(MI_STORE_DATA_IMM),<br>
> > sdi)<br>
> > > > {<br>
> > > > > @@ -558,19 +733,17 @@ genX(copy_fast_clear_dwords)(<wbr>struct<br>
> > > > anv_cmd_buffer *cmd_buffer,<br>
> > > > > struct anv_state surface_state,<br>
> > > > > const struct anv_image *image,<br>
> > > > > VkImageAspectFlagBits aspect,<br>
> > > > > - unsigned level,<br>
> > > > > bool copy_from_surface_state)<br>
> > > > > {<br>
> > > > > assert(cmd_buffer && image);<br>
> > > > > assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_<wbr>ANV);<br>
> > > > > - assert(level < anv_image_aux_levels(image, aspect));<br>
> > > > ><br>
> > > > > struct anv_bo *ss_bo =<br>
> > > > > &cmd_buffer->device-><a href="http://surface_state_pool.block_pool.bo" rel="noreferrer" target="_blank">surface_<wbr>state_pool.block_pool.bo</a>;<br>
> > > > > uint32_t ss_clear_offset = surface_state.offset +<br>
> > > > > cmd_buffer->device->isl_dev.<wbr>ss.clear_value_offset;<br>
> > > > > const struct anv_address entry_addr =<br>
> > > > > - anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image,<br>
> > > > aspect, level);<br>
> > > > > + anv_image_get_clear_color_<wbr>addr(cmd_buffer->device, image,<br>
> > > > aspect);<br>
> > > > > unsigned copy_size = cmd_buffer->device->isl_dev.<br>
> > > > ss.clear_value_size;<br>
> > > > ><br>
> > > > > if (copy_from_surface_state) {<br>
> > > > > @@ -660,18 +833,6 @@ transition_color_buffer(struct anv_cmd_buffer<br>
> > > > *cmd_buffer,<br>
> > > > > if (base_layer >= anv_image_aux_layers(image, aspect,<br>
> > base_level))<br>
> > > > > return;<br>
> > > > ><br>
> > > > > - /* A transition of a 3D subresource works on all slices at a<br>
> > time. */<br>
> > > > > - if (image->type == VK_IMAGE_TYPE_3D) {<br>
> > > > > - base_layer = 0;<br>
> > > > > - layer_count = anv_minify(image->extent.<wbr>depth, base_level);<br>
> > > > > - }<br>
> > > > > -<br>
> > > > > - /* We're interested in the subresource range subset that has aux<br>
> > > > data. */<br>
> > > > > - level_count = MIN2(level_count, anv_image_aux_levels(image,<br>
> > aspect)<br>
> > > > - base_level);<br>
> > > > > - layer_count = MIN2(layer_count,<br>
> > > > > - anv_image_aux_layers(image, aspect,<br>
> > base_level) -<br>
> > > > base_layer);<br>
> > > > > - last_level_num = base_level + level_count;<br>
> > > > > -<br>
> > > > > assert(image->tiling == VK_IMAGE_TILING_OPTIMAL);<br>
> > > > ><br>
> > > > > if (initial_layout == VK_IMAGE_LAYOUT_UNDEFINED ||<br>
> > > > > @@ -684,8 +845,8 @@ transition_color_buffer(struct anv_cmd_buffer<br>
> > > > *cmd_buffer,<br>
> > > > > *<br>
> > > > > * Initialize the relevant clear buffer entries.<br>
> > > > > */<br>
> > > > > - for (unsigned level = base_level; level < last_level_num;<br>
> > level++)<br>
> > > > > - init_fast_clear_state_entry(<wbr>cmd_buffer, image, aspect,<br>
> > level);<br>
> > > > > + if (base_level == 0 && base_layer == 0)<br>
> > > > > + init_fast_clear_color(cmd_<wbr>buffer, image, aspect);<br>
> > > > ><br>
> > > > > /* Initialize the aux buffers to enable correct rendering. In<br>
> > > > order to<br>
> > > > > * ensure that things such as storage images work correctly,<br>
> > aux<br>
> > > > buffers<br>
> > > > > @@ -723,13 +884,18 @@ transition_color_buffer(struct anv_cmd_buffer<br>
> > > > *cmd_buffer,<br>
> > > > > if (image->samples == 1) {<br>
> > > > > for (uint32_t l = 0; l < level_count; l++) {<br>
> > > > > const uint32_t level = base_level + l;<br>
> > > > > - const uint32_t level_layer_count =<br>
> > > > > + uint32_t level_layer_count =<br>
> > > > > MIN2(layer_count, anv_image_aux_layers(image, aspect,<br>
> > > > level));<br>
> > > > > +<br>
> > > > > anv_image_ccs_op(cmd_buffer, image, aspect, level,<br>
> > > > > base_layer, level_layer_count,<br>
> > > > > ISL_AUX_OP_AMBIGUATE, false);<br>
> > > ><br>
> > > > On gen8, we're doing extra ambiguates when the level is greater than 0.<br>
> > > ><br>
> > ><br>
> > > Only if we never render to it with CCS_D enabled on miplevels > 0. I'm<br>
> > not<br>
> > > sure if that's guaranteed by the current code.<br>
> > ><br>
> ><br>
> > You're right. Could you put a perf_warn for rendering with CCS_D enabled<br>
> > on miplevels > 0 for gen8 in the patch,<br>
> > "anv: Only fast clear single-slice images"?<br>
> ><br>
><br>
> We already warn for everything that isn't just single-slice rendering to<br>
> the first slice. We could make a special warning for gen8+ but I don't<br>
> really see a reason.<br>
><br>
<br>
</div></div>Where do we do this? As far as I know, we only warn for fast-clears that<br>
get turned off due to the target not being the first slice.<br>
<div><div class="h5"><br>
> --Jason<br>
><br>
><br>
> > -Nanley<br>
> ><br>
> > > --Jason<br>
> > ><br>
> > ><br>
> > > > -Nanley<br>
> > > ><br>
> > > > > - genX(set_image_needs_resolve)(<wbr>cmd_buffer, image,<br>
> > > > > - aspect, level, false);<br>
> > > > > +<br>
> > > > > + if (image->planes[plane].aux_<wbr>usage ==<br>
> > ISL_AUX_USAGE_CCS_E)<br>
> > > > {<br>
> > > > > + set_image_compressed_bit(cmd_<wbr>buffer, image, aspect,<br>
> > > > > + level, base_layer,<br>
> > > > level_layer_count,<br>
> > > > > + false);<br>
> > > > > + }<br>
> > > > > }<br>
> > > > > } else {<br>
> > > > > if (image->samples == 4 || image->samples == 16) {<br>
> > > > > @@ -812,19 +978,17 @@ transition_color_buffer(struct anv_cmd_buffer<br>
> > > > *cmd_buffer,<br>
> > > > > cmd_buffer->state.pending_<wbr>pipe_bits |=<br>
> > > > > ANV_PIPE_RENDER_TARGET_CACHE_<wbr>FLUSH_BIT |<br>
> > ANV_PIPE_CS_STALL_BIT;<br>
> > > > ><br>
> > > > > - for (uint32_t level = base_level; level < last_level_num;<br>
> > level++) {<br>
> > > > > + for (uint32_t l = 0; l < level_count; l++) {<br>
> > > > > + uint32_t level = base_level + l;<br>
> > > > > + uint32_t level_layer_count =<br>
> > > > > + MIN2(layer_count, anv_image_aux_layers(image, aspect,<br>
> > level));<br>
> > > > ><br>
> > > > > - /* The number of layers changes at each 3D miplevel. */<br>
> > > > > - if (image->type == VK_IMAGE_TYPE_3D) {<br>
> > > > > - layer_count = MIN2(layer_count, anv_image_aux_layers(image,<br>
> > > > aspect, level));<br>
> > > > > + for (uint32_t a = 0; a < level_layer_count; a++) {<br>
> > > > > + uint32_t array_layer = base_layer + a;<br>
> > > > > + anv_cmd_predicated_ccs_<wbr>resolve(cmd_buffer, image, aspect,<br>
> > > > > + level, array_layer,<br>
> > resolve_op,<br>
> > > > > + final_fast_clear);<br>
> > > > > }<br>
> > > > > -<br>
> > > > > - genX(load_needs_resolve_<wbr>predicate)(cmd_buffer, image, aspect,<br>
> > > > level);<br>
> > > > > -<br>
> > > > > - anv_image_ccs_op(cmd_buffer, image, aspect, level,<br>
> > > > > - base_layer, layer_count, resolve_op, true);<br>
> > > > > -<br>
> > > > > - genX(set_image_needs_resolve)(<wbr>cmd_buffer, image, aspect,<br>
> > level,<br>
> > > > false);<br>
> > > > > }<br>
> > > > ><br>
> > > > > cmd_buffer->state.pending_<wbr>pipe_bits |=<br>
> > > > > @@ -1488,12 +1652,20 @@ void genX(CmdPipelineBarrier)(<br>
> > > > > anv_image_expand_aspects(<wbr>image, range->aspectMask);<br>
> > > > > uint32_t aspect_bit;<br>
> > > > ><br>
> > > > > + uint32_t base_layer, layer_count;<br>
> > > > > + if (image->type == VK_IMAGE_TYPE_3D) {<br>
> > > > > + base_layer = 0;<br>
> > > > > + layer_count = anv_minify(image->extent.<wbr>depth,<br>
> > > > range->baseMipLevel);<br>
> > > > > + } else {<br>
> > > > > + base_layer = range->baseArrayLayer;<br>
> > > > > + layer_count = anv_get_layerCount(image, range);<br>
> > > > > + }<br>
> > > > > +<br>
> > > > > anv_foreach_image_aspect_bit(<wbr>aspect_bit, image,<br>
> > > > color_aspects) {<br>
> > > > > transition_color_buffer(cmd_<wbr>buffer, image, 1UL <<<br>
> > > > aspect_bit,<br>
> > > > > range->baseMipLevel,<br>
> > > > > anv_get_levelCount(image,<br>
> > range),<br>
> > > > > - range->baseArrayLayer,<br>
> > > > > - anv_get_layerCount(image,<br>
> > range),<br>
> > > > > + base_layer, layer_count,<br>
> > > > > pImageMemoryBarriers[i].<br>
> > oldLayout,<br>
> > > > > pImageMemoryBarriers[i].<br>
> > newLayout);<br>
> > > > > }<br>
> > > > > @@ -3203,28 +3375,26 @@ cmd_buffer_subpass_sync_fast_<br>
> > clear_values(struct<br>
> > > > anv_cmd_buffer *cmd_buffer)<br>
> > > > > genX(copy_fast_clear_dwords)(<wbr>cmd_buffer,<br>
> > > > att_state->color.state,<br>
> > > > > iview->image,<br>
> > > > > VK_IMAGE_ASPECT_COLOR_BIT,<br>
> > > > > - iview->planes[0].isl.base_<br>
> > level,<br>
> > > > > true /* copy from ss */);<br>
> > > > ><br>
> > > > > /* Fast-clears impact whether or not a resolve will be<br>
> > > > necessary. */<br>
> > > > > - if (iview->image->planes[0].aux_<wbr>usage ==<br>
> > ISL_AUX_USAGE_CCS_E<br>
> > > > &&<br>
> > > > > - att_state->clear_color_is_<wbr>zero) {<br>
> > > > > + if (att_state->clear_color_is_<wbr>zero) {<br>
> > > > > /* This image always has the auxiliary buffer enabled.<br>
> > We<br>
> > > > can mark<br>
> > > > > * the subresource as not needing a resolve because the<br>
> > > > clear color<br>
> > > > > * will match what's in every RENDER_SURFACE_STATE<br>
> > object<br>
> > > > when it's<br>
> > > > > * being used for sampling.<br>
> > > > > */<br>
> > > > > - genX(set_image_needs_resolve)(<wbr>cmd_buffer, iview->image,<br>
> > > > > - VK_IMAGE_ASPECT_COLOR_BIT,<br>
> > > > > - iview->planes[0].isl.base_<br>
> > > > level,<br>
> > > > > - false);<br>
> > > > > + set_image_fast_clear_state(<wbr>cmd_buffer, iview->image,<br>
> > > > > + VK_IMAGE_ASPECT_COLOR_BIT,<br>
> > > > > +<br>
> > ANV_FAST_CLEAR_DEFAULT_VALUE);<br>
> > > > > } else {<br>
> > > > > - genX(set_image_needs_resolve)(<wbr>cmd_buffer, iview->image,<br>
> > > > > - VK_IMAGE_ASPECT_COLOR_BIT,<br>
> > > > > - iview->planes[0].isl.base_<br>
> > > > level,<br>
> > > > > - true);<br>
> > > > > + set_image_fast_clear_state(<wbr>cmd_buffer, iview->image,<br>
> > > > > + VK_IMAGE_ASPECT_COLOR_BIT,<br>
> > > > > + ANV_FAST_CLEAR_ANY);<br>
> > > > > }<br>
> > > > > - } else if (rp_att->load_op == VK_ATTACHMENT_LOAD_OP_LOAD) {<br>
> > > > > + } else if (rp_att->load_op == VK_ATTACHMENT_LOAD_OP_LOAD &&<br>
> > > > > + iview->planes[0].isl.base_<wbr>level == 0 &&<br>
> > > > > + iview->planes[0].isl.base_<wbr>array_layer == 0) {<br>
> > > > > /* The attachment may have been fast-cleared in a previous<br>
> > > > render<br>
> > > > > * pass and the value is needed now. Update the surface<br>
> > > > state(s).<br>
> > > > > *<br>
> > > > > @@ -3233,7 +3403,6 @@ cmd_buffer_subpass_sync_fast_<br>
> > clear_values(struct<br>
> > > > anv_cmd_buffer *cmd_buffer)<br>
> > > > > genX(copy_fast_clear_dwords)(<wbr>cmd_buffer,<br>
> > > > att_state->color.state,<br>
> > > > > iview->image,<br>
> > > > > VK_IMAGE_ASPECT_COLOR_BIT,<br>
</div></div>> > > > > - iview->planes[0].isl.base_<br>
<span class="im HOEnZb">> > level,<br>
> > > > > false /* copy to ss */);<br>
> > > > ><br>
</span><div class="HOEnZb"><div class="h5">> > > > > if (need_input_attachment_state(<wbr>rp_att) &&<br>
> > > > > @@ -3241,7 +3410,6 @@ cmd_buffer_subpass_sync_fast_<br>
> > clear_values(struct<br>
> > > > anv_cmd_buffer *cmd_buffer)<br>
> > > > > genX(copy_fast_clear_dwords)(<wbr>cmd_buffer,<br>
> > > > att_state->input.state,<br>
> > > > > iview->image,<br>
> > > > > VK_IMAGE_ASPECT_COLOR_BIT,<br>
> > > > > - iview->planes[0].isl.base_<br>
> > > > level,<br>
> > > > > false /* copy to ss */);<br>
> > > > > }<br>
> > > > > }<br>
> > > > > --<br>
> > > > > 2.5.0.400.gff86faf<br>
> > > > ><br>
> > > > > ______________________________<wbr>_________________<br>
> > > > > mesa-dev mailing list<br>
> > > > > <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> > > > > <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
> > > ><br>
> ><br>
</div></div></blockquote></div><br></div></div>