[Mesa-dev] [PATCH v3 20/24] anv/cmd_buffer: Rework aux tracking
Nanley Chery
nanleychery at gmail.com
Wed Feb 7 23:17:35 UTC 2018
On Wed, Feb 07, 2018 at 03:02:12PM -0800, Jason Ekstrand wrote:
> On Wed, Feb 7, 2018 at 1:39 PM, Nanley Chery <nanleychery at gmail.com> wrote:
>
> > On Wed, Feb 07, 2018 at 11:27:38AM -0800, Jason Ekstrand wrote:
> > > On Wed, Feb 7, 2018 at 9:59 AM, Nanley Chery <nanleychery at gmail.com>
> > wrote:
> > >
> > > > On Tue, Feb 06, 2018 at 05:57:26PM -0800, Jason Ekstrand wrote:
> > > > > On Tue, Feb 6, 2018 at 5:09 PM, Nanley Chery <nanleychery at gmail.com>
> > > > wrote:
> > > > >
> > > > > > On Mon, Feb 05, 2018 at 06:16:26PM -0800, Jason Ekstrand wrote:
> > > > > > > This commit completely reworks aux tracking. This includes a
> > number
> > > > of
> > > > > > > somewhat distinct changes:
> > > > > > >
> > > > > > > 1) Since we are no longer fast-clearing multiple slices, we only
> > > > need
> > > > > > > to track one fast clear color and one fast clear type.
> > > > > > >
> > > > > > > 2) We store two bits for fast clear instead of one to let us
> > > > > > > distinguish between zero and non-zero fast clear colors.
> > This is
> > > > > > > needed so that we can do full resolves when transitioning to
> > > > > > > PRESENT_SRC_KHR with gen9 CCS images where we allow zero
> > clear
> > > > > > > values in all sorts of places we wouldn't normally.
> > > > > > >
> > > > > > > 3) We now track compression state as a boolean separate from
> > fast
> > > > clear
> > > > > > > type and this is tracked on a per-slice granularity.
> > > > > > >
> > > > > > > The previous scheme had some issues when it came to individual
> > > > slices of
> > > > > > > a multi-LOD images. In particular, we only tracked "needs
> > resolve"
> > > > > > > per-LOD but you could do a vkCmdPipelineBarrier that would only
> > > > resolve
> > > > > > > a portion of the image and would set "needs resolve" to false
> > anyway.
> > > > > > > Also, any transition from an undefined layout would reset the
> > clear
> > > > > > > color for the entire LOD regardless of whether or not there was
> > some
> > > > > > > clear color on some other slice.
> > > > > > >
> > > > > > > As far as full/partial resolves go, he assumptions of the
> > previous
> > > > > > > scheme held because the one case where we do need a full resolve
> > when
> > > > > > > CCS_E is enabled is for window-system images. Since we only ever
> > > > > > > allowed X-tiled window-system images, CCS was entirely disabled
> > on
> > > > gen9+
> > > > > > > and we never got CCS_E. With the advent of Y-tiled window-system
> > > > > > > buffers, we now need to properly support doing a full resolve of
> > > > images
> > > > > > > marked CCS_E.
> > > > > > >
> > > > > > > v2 (Jason Ekstrand):
> > > > > > > - Fix an bug in the compressed flag offset calculation
> > > > > > > - Treat 3D images as multi-slice for the purposes of resolve
> > > > tracking
> > > > > > >
> > > > > > > Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > > > > > > ---
> > > > > > > src/intel/vulkan/anv_blorp.c | 3 +-
> > > > > > > src/intel/vulkan/anv_image.c | 100 ++++++-----
> > > > > > > src/intel/vulkan/anv_private.h | 60 ++++---
> > > > > > > src/intel/vulkan/genX_cmd_buffer.c | 340
> > > > +++++++++++++++++++++++++++---
> > > > > > -------
> > > > > > > 4 files changed, 345 insertions(+), 158 deletions(-)
> > > > > > >
> > > > > > > diff --git a/src/intel/vulkan/anv_blorp.c
> > > > b/src/intel/vulkan/anv_blorp.c
> > > > > > > index 497ae6f..fc3b717 100644
> > > > > > > --- a/src/intel/vulkan/anv_blorp.c
> > > > > > > +++ b/src/intel/vulkan/anv_blorp.c
> > > > > > > @@ -1758,8 +1758,7 @@ anv_image_ccs_op(struct anv_cmd_buffer
> > > > *cmd_buffer,
> > > > > > > * particular value and don't care about format or clear
> > > > value.
> > > > > > > */
> > > > > > > const struct anv_address clear_color_addr =
> > > > > > > - anv_image_get_clear_color_addr(cmd_buffer->device,
> > image,
> > > > > > > - aspect, level);
> > > > > > > + anv_image_get_clear_color_addr(cmd_buffer->device,
> > image,
> > > > > > aspect);
> > > > > > > surf.clear_color_addr = anv_to_blorp_address(clear_
> > > > color_addr);
> > > > > > > }
> > > > > > >
> > > > > > > diff --git a/src/intel/vulkan/anv_image.c
> > > > b/src/intel/vulkan/anv_image.c
> > > > > > > index 11942d0..011e952 100644
> > > > > > > --- a/src/intel/vulkan/anv_image.c
> > > > > > > +++ b/src/intel/vulkan/anv_image.c
> > > > > > > @@ -190,46 +190,54 @@ all_formats_ccs_e_compatible(const struct
> > > > > > gen_device_info *devinfo,
> > > > > > > * fast-clear values in non-trivial cases (e.g., outside of a
> > render
> > > > > > pass in
> > > > > > > * which a fast clear has occurred).
> > > > > > > *
> > > > > > > - * For the purpose of discoverability, the algorithm used to
> > manage
> > > > > > this buffer
> > > > > > > - * is described here. A clear value in this buffer is updated
> > when a
> > > > > > fast clear
> > > > > > > - * is performed on a subresource. One of two synchronization
> > > > operations
> > > > > > is
> > > > > > > - * performed in order for a following memory access to use the
> > > > > > fast-clear
> > > > > > > - * value:
> > > > > > > - * a. Copy the value from the buffer to the surface state
> > object
> > > > > > used for
> > > > > > > - * reading. This is done implicitly when the value is the
> > > > clear
> > > > > > value
> > > > > > > - * predetermined to be the default in other surface state
> > > > > > objects. This
> > > > > > > - * is currently only done explicitly for the operation
> > below.
> > > > > > > - * b. Do (a) and use the surface state object to resolve the
> > > > > > subresource.
> > > > > > > - * This is only done during layout transitions for decent
> > > > > > performance.
> > > > > > > + * In order to avoid having multiple clear colors for a single
> > > > plane of
> > > > > > an
> > > > > > > + * image (hence a single RENDER_SURFACE_STATE), we only allow
> > > > > > fast-clears on
> > > > > > > + * the first slice (level 0, layer 0). At the time of our
> > testing
> > > > (Jan
> > > > > > 17,
> > > > > > > + * 2018), there were no known applications which would benefit
> > from
> > > > > > fast-
> > > > > > > + * clearing more than just the first slice.
> > > > > > > *
> > > > > > > - * With the above scheme, we can fast-clear whenever the
> > hardware
> > > > > > allows except
> > > > > > > - * for two cases in which synchronization becomes impossible or
> > > > > > undesirable:
> > > > > > > - * * The subresource is in the GENERAL layout and is cleared
> > to a
> > > > > > value
> > > > > > > - * other than the special default value.
> > > > > > > + * The fast clear portion of the image is laid out in the
> > following
> > > > > > order:
> > > > > > > *
> > > > > > > - * Performing a synchronization operation in order to read
> > > > from the
> > > > > > > - * subresource is undesirable in this case. Firstly, b) is
> > not
> > > > an
> > > > > > option
> > > > > > > - * because a layout transition isn't required between a
> > write
> > > > and
> > > > > > read of
> > > > > > > - * an image in the GENERAL layout. Secondly, it's
> > undesirable
> > > > to
> > > > > > do a)
> > > > > > > - * explicitly because it would require large
> > infrastructural
> > > > > > changes. The
> > > > > > > - * Vulkan API supports us in deciding not to optimize this
> > > > layout
> > > > > > by
> > > > > > > - * stating that using this layout may cause suboptimal
> > > > > > performance. NOTE:
> > > > > > > - * the auxiliary buffer must always be enabled to support
> > a)
> > > > > > implicitly.
> > > > > > > + * * 1 or 4 dwords (depending on hardware generation) for the
> > clear
> > > > > > color
> > > > > > > + * * 1 dword for the anv_fast_clear_type of the clear color
> > > > > > > + * * On gen9+, 1 dword per level and layer of the image (3D
> > levels
> > > > > > count
> > > > > > > + * multiple layers) in level-major order for compression
> > state.
> > > > > > > *
> > > > > > > + * For the purpose of discoverability, the algorithm used to
> > manage
> > > > > > > + * compression and fast-clears is described here:
> > > > > > > *
> > > > > > > - * * For the given miplevel, only some of the layers are
> > cleared
> > > > at
> > > > > > once.
> > > > > > > + * * On a transition from UNDEFINED or PREINITIALIZED to a
> > defined
> > > > > > layout,
> > > > > > > + * all of the values in the fast clear portion of the image
> > are
> > > > > > initialized
> > > > > > > + * to default values.
> > > > > > > *
> > > > > > > - * If the user clears each layer to a different value, then
> > > > tries
> > > > > > to
> > > > > > > - * render to multiple layers at once, we have no ability to
> > > > > > perform a
> > > > > > > - * synchronization operation in between. a) is not helpful
> > > > because
> > > > > > the
> > > > > > > - * object can only hold one clear value. b) is not an
> > option
> > > > > > because a
> > > > > > > - * layout transition isn't required in this case.
> > > > > > > + * * On fast-clear, the clear value is written into surface
> > state
> > > > and
> > > > > > also
> > > > > > > + * into the buffer and the fast clear type is set
> > appropriately.
> > > > > > Both
> > > > > > > + * setting the fast-clear value in the buffer and setting the
> > > > > > fast-clear
> > > > > > > + * type happen from the GPU using MI commands.
> > > > > >
> > > > > > There's no mention about setting the compression dwords during
> > > > > > fast-clear operations or on render target writes.
> > > > > >
> > > > >
> > > > > I've added another bullet:
> > > > >
> > > > > * * Whenever a render or blorp operation is performed with CCS_E,
> > we
> > > > call
> > > > > * anv_cmd_buffer_mark_image_written to set the compression
> > state to
> > > > 1.
> > > > >
> > > > >
> > > >
> > > > That'll work. I think we should replace
> > > > anv_cmd_buffer_mark_image_written with
> > > > genX(cmd_buffer_mark_image_written) though, since the former's a
> > wrapper
> > > > that only gets called for blorp operations.
> > > >
> > >
> > > Done.
> > >
> > >
> > > > > > > + *
> > > > > > > + * * On pipeline barrier transitions, the worst-case
> > transition is
> > > > > > computed
> > > > > > > + * from the image layouts. The command streamer inspects the
> > > > fast
> > > > > > clear
> > > > > > > + * type and compression state dwords and constructs a
> > > > predicate. The
> > > > > > > + * worst-case resolve is performed with the given predicate
> > and
> > > > the
> > > > > > fast
> > > > > > > + * clear and compression state is set accordingly.
> > > > > > > + *
> > > > > > > + * See anv_layout_to_aux_usage and anv_layout_to_fast_clear_type
> > > > > > functions for
> > > > > > > + * details on exactly what is allowed in what layouts.
> > > > > > > + *
> > > > > > > + * On gen7-9, we do not have a concept of indirect clear colors
> > in
> > > > > > hardware.
> > > > > > > + * In order to deal with this, we have to do some clear color
> > > > > > management.
> > > > > > > + *
> > > > > > > + * * For LOAD_OP_LOAD at the top of a renderpass, we have to
> > copy
> > > > the
> > > > > > clear
> > > > > > > + * value from the buffer into the surface state with MI
> > commands.
> > > > > > > + *
> > > > > > > + * * For any blorp operations, we pass the address to the clear
> > > > value
> > > > > > into
> > > > > > > + * blorp and it knows to copy the clear color.
> > > > > > > */
> > > > > > > static void
> > > > > > > -add_fast_clear_state_buffer(struct anv_image *image,
> > > > > > > - VkImageAspectFlagBits aspect,
> > > > > > > - uint32_t plane,
> > > > > > > - const struct anv_device *device)
> > > > > > > +add_aux_state_tracking_buffer(struct anv_image *image,
> > > > > > > + VkImageAspectFlagBits aspect,
> > > > > > > + uint32_t plane,
> > > > > > > + const struct anv_device *device)
> > > > > >
> > > > > > The comment referencing this function in genX_cmd_buffer.c needs
> > to be
> > > > > > updated.
> > > > > >
> > > > >
> > > > > I don't see that comment anywhere.
> > > > >
> > > > >
> > > >
> > > > Maybe you delete it in a future series? This is the one I have in mind:
> > > >
> > > > /* We only allow fast clears in the GENERAL layout if the auxiliary
> > > > * buffer is always enabled and the fast-clear value is all 0's. See
> > > > * add_fast_clear_state_buffer() for more information.
> > > > */
> > > >
> > >
> > > Right. Somehow my grepping missed it. Fixed.
> > >
> > >
> > > > > > > {
> > > > > > > assert(image && device);
> > > > > > > assert(image->planes[plane].aux_surface.isl.size > 0 &&
> > > > > > > @@ -251,20 +259,24 @@ add_fast_clear_state_buffer(struct
> > anv_image
> > > > > > *image,
> > > > > > > (image->planes[plane].offset +
> > > > image->planes[plane].size));
> > > > > > > }
> > > > > > >
> > > > > > > - const unsigned entry_size = anv_fast_clear_state_entry_
> > > > size(device);
> > > > > > > - /* There's no padding between entries, so ensure that they're
> > > > always
> > > > > > a
> > > > > > > - * multiple of 32 bits in order to enable GPU memcpy
> > operations.
> > > > > > > - */
> > > > > > > - assert(entry_size % 4 == 0);
> > > > > > > + /* Clear color and fast clear type */
> > > > > > > + unsigned state_size = device->isl_dev.ss.clear_value_size +
> > 4;
> > > > > > >
> > > > > > > - const unsigned plane_state_size =
> > > > > > > - entry_size * anv_image_aux_levels(image, aspect);
> > > > > > > + /* We only need to track compression on CCS_E surfaces. */
> > > > > > > + if (image->planes[plane].aux_usage == ISL_AUX_USAGE_CCS_E) {
> > > > > > > + if (image->type == VK_IMAGE_TYPE_3D) {
> > > > > > > + for (uint32_t l = 0; l < image->levels; l++)
> > > > > > > + state_size += anv_minify(image->extent.depth, l) *
> > 4;
> > > > > > > + } else {
> > > > > > > + state_size += image->levels * image->array_size * 4;
> > > > > > > + }
> > > > > > > + }
> > > > > > >
> > > > > > > image->planes[plane].fast_clear_state_offset =
> > > > > > > image->planes[plane].offset + image->planes[plane].size;
> > > > > > >
> > > > > > > - image->planes[plane].size += plane_state_size;
> > > > > > > - image->size += plane_state_size;
> > > > > > > + image->planes[plane].size += state_size;
> > > > > > > + image->size += state_size;
> > > > > > > }
> > > > > > >
> > > > > > > /**
> > > > > > > @@ -437,7 +449,7 @@ make_surface(const struct anv_device *dev,
> > > > > > > }
> > > > > > >
> > > > > > > add_surface(image, &image->planes[plane].aux_
> > surface,
> > > > > > plane);
> > > > > > > - add_fast_clear_state_buffer(image, aspect, plane,
> > dev);
> > > > > > > + add_aux_state_tracking_buffer(image, aspect, plane,
> > > > dev);
> > > > > > >
> > > > > > > /* For images created without MUTABLE_FORMAT_BIT
> > set, we
> > > > > > know that
> > > > > > > * they will always be used with the original
> > format.
> > > > In
> > > > > > > @@ -461,7 +473,7 @@ make_surface(const struct anv_device *dev,
> > > > > > > &image->planes[plane].aux_
> > > > > > surface.isl);
> > > > > > > if (ok) {
> > > > > > > add_surface(image, &image->planes[plane].aux_surface,
> > > > plane);
> > > > > > > - add_fast_clear_state_buffer(image, aspect, plane,
> > dev);
> > > > > > > + add_aux_state_tracking_buffer(image, aspect, plane,
> > dev);
> > > > > > > image->planes[plane].aux_usage = ISL_AUX_USAGE_MCS;
> > > > > > > }
> > > > > > > }
> > > > > > > diff --git a/src/intel/vulkan/anv_private.h
> > b/src/intel/vulkan/anv_
> > > > > > private.h
> > > > > > > index 5f82702..d38dd9e 100644
> > > > > > > --- a/src/intel/vulkan/anv_private.h
> > > > > > > +++ b/src/intel/vulkan/anv_private.h
> > > > > > > @@ -2533,50 +2533,58 @@ anv_image_aux_layers(const struct
> > anv_image *
> > > > > > const image,
> > > > > > > }
> > > > > > > }
> > > > > > >
> > > > > > > -static inline unsigned
> > > > > > > -anv_fast_clear_state_entry_size(const struct anv_device
> > *device)
> > > > > > > -{
> > > > > > > - assert(device);
> > > > > > > - /* Entry contents:
> > > > > > > - * +--------------------------------------------+
> > > > > > > - * | clear value dword(s) | needs resolve dword |
> > > > > > > - * +--------------------------------------------+
> > > > > > > - */
> > > > > > > -
> > > > > > > - /* Ensure that the needs resolve dword is in fact
> > dword-aligned
> > > > to
> > > > > > enable
> > > > > > > - * GPU memcpy operations.
> > > > > > > - */
> > > > > > > - assert(device->isl_dev.ss.clear_value_size % 4 == 0);
> > > > > > > - return device->isl_dev.ss.clear_value_size + 4;
> > > > > > > -}
> > > > > > > -
> > > > > > > static inline struct anv_address
> > > > > > > anv_image_get_clear_color_addr(const struct anv_device *device,
> > > > > > > const struct anv_image *image,
> > > > > > > - VkImageAspectFlagBits aspect,
> > > > > > > - unsigned level)
> > > > > > > + VkImageAspectFlagBits aspect)
> > > > > > > {
> > > > > > > + assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV);
> > > > > > > +
> > > > > > > uint32_t plane = anv_image_aspect_to_plane(image->aspects,
> > > > aspect);
> > > > > > > return (struct anv_address) {
> > > > > > > .bo = image->planes[plane].bo,
> > > > > > > .offset = image->planes[plane].bo_offset +
> > > > > > > - image->planes[plane].fast_clear_state_offset +
> > > > > > > - anv_fast_clear_state_entry_size(device) *
> > level,
> > > > > > > + image->planes[plane].fast_clear_state_offset,
> > > > > > > };
> > > > > > > }
> > > > > > >
> > > > > > > static inline struct anv_address
> > > > > > > -anv_image_get_needs_resolve_addr(const struct anv_device
> > *device,
> > > > > > > - const struct anv_image *image,
> > > > > > > - VkImageAspectFlagBits aspect,
> > > > > > > - unsigned level)
> > > > > > > +anv_image_get_fast_clear_type_addr(const struct anv_device
> > *device,
> > > > > > > + const struct anv_image
> > *image,
> > > > > > > + VkImageAspectFlagBits aspect)
> > > > > > > {
> > > > > > > struct anv_address addr =
> > > > > > > - anv_image_get_clear_color_addr(device, image, aspect,
> > level);
> > > > > > > + anv_image_get_clear_color_addr(device, image, aspect);
> > > > > > > addr.offset += device->isl_dev.ss.clear_value_size;
> > > > > > > return addr;
> > > > > > > }
> > > > > > >
> > > > > > > +static inline struct anv_address
> > > > > > > +anv_image_get_compression_state_addr(const struct anv_device
> > > > *device,
> > > > > > > + const struct anv_image
> > *image,
> > > > > > > + VkImageAspectFlagBits
> > aspect,
> > > > > > > + uint32_t level, uint32_t
> > > > > > array_layer)
> > > > > > > +{
> > > > > > > + assert(level < anv_image_aux_levels(image, aspect));
> > > > > > > + assert(array_layer < anv_image_aux_layers(image, aspect,
> > level));
> > > > > > > + UNUSED uint32_t plane = anv_image_aspect_to_plane(
> > > > image->aspects,
> > > > > > aspect);
> > > > > > > + assert(image->planes[plane].aux_usage ==
> > ISL_AUX_USAGE_CCS_E);
> > > > > > > +
> > > > > > > + struct anv_address addr =
> > > > > > > + anv_image_get_fast_clear_type_addr(device, image,
> > aspect);
> > > > > > > + addr.offset += 4; /* Go past the fast clear type */
> > > > > > > +
> > > > > > > + if (image->type == VK_IMAGE_TYPE_3D) {
> > > > > > > + for (uint32_t l = 0; l < level; l++)
> > > > > > > + addr.offset += anv_minify(image->extent.depth, l) * 4;
> > > > > > > + } else {
> > > > > > > + addr.offset += level * image->array_size * 4;
> > > > > > > + }
> > > > > > > + addr.offset += array_layer * 4;
> > > > > > > +
> > > > > > > + return addr;
> > > > > > > +}
> > > > > > > +
> > > > > > > /* Returns true if a HiZ-enabled depth buffer can be sampled
> > from.
> > > > */
> > > > > > > static inline bool
> > > > > > > anv_can_sample_with_hiz(const struct gen_device_info * const
> > > > devinfo,
> > > > > > > diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> > > > > > b/src/intel/vulkan/genX_cmd_buffer.c
> > > > > > > index e1a4d95..4c75e0c 100644
> > > > > > > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > > > > > > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > > > > > > @@ -407,27 +407,45 @@ transition_depth_buffer(struct
> > anv_cmd_buffer
> > > > > > *cmd_buffer,
> > > > > > > #define MI_PREDICATE_SRC0 0x2400
> > > > > > > #define MI_PREDICATE_SRC1 0x2408
> > > > > > >
> > > > > > > -/* Manages the state of an color image subresource to ensure
> > > > resolves
> > > > > > are
> > > > > > > - * performed properly.
> > > > > > > - */
> > > > > > > static void
> > > > > > > -genX(set_image_needs_resolve)(struct anv_cmd_buffer
> > *cmd_buffer,
> > > > > > > - const struct anv_image *image,
> > > > > > > - VkImageAspectFlagBits aspect,
> > > > > > > - unsigned level, bool needs_resolve)
> > > > > > > +set_image_fast_clear_state(struct anv_cmd_buffer *cmd_buffer,
> > > > > > > + const struct anv_image *image,
> > > > > > > + VkImageAspectFlagBits aspect,
> > > > > > > + enum anv_fast_clear_type fast_clear)
> > > > > > > {
> > > > > > > - assert(cmd_buffer && image);
> > > > > > > - assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV);
> > > > > > > - assert(level < anv_image_aux_levels(image, aspect));
> > > > > > > -
> > > > > > > - /* The HW docs say that there is no way to guarantee the
> > > > completion
> > > > > > of
> > > > > > > - * the following command. We use it nevertheless because it
> > > > shows no
> > > > > > > - * issues in testing is currently being used in the GL
> > driver.
> > > > > > > - */
> > > > > > > anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_DATA_IMM),
> > > > sdi) {
> > > > > > > - sdi.Address = anv_image_get_needs_resolve_
> > > > > > addr(cmd_buffer->device,
> > > > > > > - image,
> > aspect,
> > > > > > level);
> > > > > > > - sdi.ImmediateData = needs_resolve;
> > > > > > > + sdi.Address = anv_image_get_fast_clear_type_
> > > > > > addr(cmd_buffer->device,
> > > > > > > + image,
> > > > aspect);
> > > > > > > + sdi.ImmediateData = fast_clear;
> > > > > > > + }
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void
> > > > > > > +set_image_compressed_bit(struct anv_cmd_buffer *cmd_buffer,
> > > > > > > + const struct anv_image *image,
> > > > > > > + VkImageAspectFlagBits aspect,
> > > > > > > + uint32_t level,
> > > > > > > + uint32_t base_layer, uint32_t
> > layer_count,
> > > > > > > + bool compressed)
> > > > > > > +{
> > > > > > > + /* We only have CCS_E on gen9+ */
> > > > > > > + if (GEN_GEN < 9)
> > > > > > > + return;
> > > > > > > +
> > > > > >
> > > > > > I'm also fine with dropping this if statement.
> > > > > >
> > > > >
> > > > > Ok, we'll drop it.
> > > > >
> > > > >
> > > > > > > + uint32_t plane = anv_image_aspect_to_plane(image->aspects,
> > > > aspect);
> > > > > > > +
> > > > > > > + /* We only have compression tracking for CCS_E */
> > > > > > > + if (image->planes[plane].aux_usage != ISL_AUX_USAGE_CCS_E)
> > > > > > > + return;
> > > > > > > +
> > > > > > > + for (uint32_t a = 0; a < layer_count; a++) {
> > > > > > > + uint32_t layer = base_layer + a;
> > > > > > > + anv_batch_emit(&cmd_buffer->batch,
> > GENX(MI_STORE_DATA_IMM),
> > > > sdi)
> > > > > > {
> > > > > > > + sdi.Address = anv_image_get_compression_
> > > > > > state_addr(cmd_buffer->device,
> > > > > > > +
> > image,
> > > > > > aspect,
> > > > > > > +
> > level,
> > > > > > layer);
> > > > > > > + sdi.ImmediateData = compressed ? UINT32_MAX : 0;
> > > > > > > + }
> > > > > > > }
> > > > > > > }
> > > > > > >
> > > > > > > @@ -451,32 +469,176 @@ mi_alu(uint32_t opcode, uint32_t operand1,
> > > > > > uint32_t operand2)
> > > > > > > #define CS_GPR(n) (0x2600 + (n) * 8)
> > > > > > >
> > > > > > > static void
> > > > > > > -genX(load_needs_resolve_predicate)(struct anv_cmd_buffer
> > > > *cmd_buffer,
> > > > > > > - const struct anv_image
> > *image,
> > > > > > > - VkImageAspectFlagBits aspect,
> > > > > > > - unsigned level)
> > > > > > > +anv_cmd_predicated_ccs_resolve(struct anv_cmd_buffer
> > *cmd_buffer,
> > > > > > > + const struct anv_image *image,
> > > > > > > + VkImageAspectFlagBits aspect,
> > > > > > > + uint32_t level, uint32_t
> > array_layer,
> > > > > > > + enum isl_aux_op resolve_op,
> > > > > > > + enum anv_fast_clear_type
> > > > > > fast_clear_supported)
> > > > > > > {
> > > > > > > - assert(cmd_buffer && image);
> > > > > > > - assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV);
> > > > > > > - assert(level < anv_image_aux_levels(image, aspect));
> > > > > > > + struct anv_address fast_clear_type_addr =
> > > > > > > + anv_image_get_fast_clear_type_addr(cmd_buffer->device,
> > image,
> > > > > > aspect);
> > > > > > > +
> > > > > > > +#if GEN_GEN >= 9
> > > > > > > + const uint32_t plane = anv_image_aspect_to_plane(
> > image->aspects,
> > > > > > aspect);
> > > > > > > + const bool decompress =
> > > > > > > + resolve_op == ISL_AUX_OP_FULL_RESOLVE &&
> > > > > > > + image->planes[plane].aux_usage == ISL_AUX_USAGE_CCS_E;
> > > > > > > +
> > > > > > > + /* This function shouldn't get called if it isn't going to do
> > > > > > anything */
> > > > > > > + assert(decompress || fast_clear_supported <
> > ANV_FAST_CLEAR_ANY);
> > > > > > > +
> > > > > > > + if (level == 0 && array_layer == 0) {
> > > > > > > + /* This is the complex case because we have to worry about
> > > > > > dealing with
> > > > > > > + * the fast clear color. Unfortunately, it's also the
> > common
> > > > > > case.
> > > > > > > + */
> > > > > > > +
> > > > > > > + /* Poor-man's register allocation */
> > > > > > > + int next_reg = MI_ALU_REG0;
> > > > > > > + int pred_reg = -1;
> > > > > > > +
> > > > > > > + /* Needed for ALU operations */
> > > > > > > + uint32_t *dw;
> > > > > > > +
> > > > > > > + const int image_fc = next_reg++;
> > > > > > > + anv_batch_emit(&cmd_buffer->batch,
> > > > GENX(MI_LOAD_REGISTER_MEM),
> > > > > > lrm) {
> > > > > > > + lrm.RegisterAddress = CS_GPR(image_fc);
> > > > > > > + lrm.MemoryAddress = fast_clear_type_addr;
> > > > > > > + }
> > > > > > > + emit_lri(&cmd_buffer->batch, CS_GPR(image_fc) + 4, 0);
> > > > > > > +
> > > > > > > + if (fast_clear_supported < ANV_FAST_CLEAR_ANY) {
> > > > > > > + /* We need to compute (fast_clear_supported <
> > > > > > image->fast_clear).
> > > > > > > + * We do this by subtracting and storing the carry bit.
> > > > > > > + */
> > > > > > > + const int fc_imm = next_reg++;
> > > > > > > + emit_lri(&cmd_buffer->batch, CS_GPR(fc_imm),
> > > > > > fast_clear_supported);
> > > > > > > + emit_lri(&cmd_buffer->batch, CS_GPR(fc_imm) + 4, 0);
> > > > > > > +
> > > > > > > + assert(pred_reg == -1);
> > > > > > > + pred_reg = next_reg++;
> > > > > > > +
> > > > > > > + dw = anv_batch_emitn(&cmd_buffer->batch, 5,
> > > > GENX(MI_MATH));
> > > > > > > + dw[1] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCA, fc_imm);
> > > > > > > + dw[2] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCB, image_fc);
> > > > > > > + dw[3] = mi_alu(MI_ALU_SUB, 0, 0);
> > > > > > > + dw[4] = mi_alu(MI_ALU_STORE, pred_reg, MI_ALU_CF);
> > > > > > > + }
> > > > > > > +
> > > > > > > + if (decompress) {
> > > > > > > + /* If we're doing a full resolve, we need the
> > compression
> > > > > > state */
> > > > > > > + struct anv_address compression_state_addr =
> > > > > > > + anv_image_get_compression_
> > > > state_addr(cmd_buffer->device,
> > > > > > image,
> > > > > > > + aspect, level,
> > > > > > array_layer);
> > > > > > > + if (pred_reg == -1) {
> > > > > > > + pred_reg = next_reg++;
> > > > > > > + anv_batch_emit(&cmd_buffer->batch,
> > > > > > GENX(MI_LOAD_REGISTER_MEM), lrm) {
> > > > > > > + lrm.RegisterAddress = CS_GPR(pred_reg);
> > > > > > > + lrm.MemoryAddress = compression_state_addr;
> > > > > > > + }
> > > > > >
> > > > >
> > > > > In this case, we don't have a fast-clear predicate. This is because
> > "if
> > > > > (fast_clear_supported < ANV_FAST_CLEAR_ANY)" failed so we didn't
> > compute
> > > > > one. In this case, we treat the fast-clear predicate as being 0 and
> > OR
> > > > > with 0 is just the original value.
> > > > >
> > > > >
> > > > >
> > > > > > > + } else {
> > > > > > > + /* OR the compression state into the predicate. The
> > > > > > compression
> > > > > > > + * state is already in 0/~0 form.
> > > > > > > + */
> > > > > >
> > > > > > The OR is a result of this function's structure and wouldn't be
> > needed
> > > > > > if the control flow in this function was different right?
> > > > > >
> > > > >
> > > > > In this case, we have both a fast-clear predicate and a compression
> > > > > predicate. We have to OR them together.
> > > > >
> > > > > I'm not sure what you mean about control-flow in this function being
> > > > > different. Yes, I'm probably over-optimizing a bit to avoid
> > unneeded CS
> > > > > operations.
> > > > >
> > > > >
> > > >
> > > > By that, I'm referring to the organization of the if's and else's in
> > > > this function - though that still isn't very exact/clear. Sorry for the
> > > > confusion.
> > > >
> > > > It seems like the OR is one more CS operation than neeeded. It's always
> > > > the case that if we have some fast-clear data, the compression state
> > > > will always be true.
> > > >
> > >
> > > I'm not sure if that's always true or not. I think it is in practice
> > since
> > > we only fast clear at the top of render passes and we always call
> > > mark_image_written at the top of the render pass too. However, if the
> > > first use of a fast-cleared image is an input attachment, this may not be
> > > the case. Also, if we ever try to do something clever like you suggested
> > > with PS depth count, this won't be the case. In GL, it's easy because
> > all
> > > you have to do is a fast-clear and some rendering as sRGB and you get
> > into
> > > the PARTIAL_CLEAR state. In any case, I don't know that we want them to
> > be
> > > coupled.
> > >
> > >
> >
> > Clearing input attachments is not a problem because the attachment must
> > also be used as a color attachment in the same subpass:
> >
> > If the first use of an attachment in this render pass is as an input
> > attachment, and the attachment is not also used as a color or
> > depth/stencil attachment in the same subpass, then loadOp must not be
> > VK_ATTACHMENT_LOAD_OP_CLEAR
> >
> > Regardless of what we do with mark_image_written (use PS_DEPTH_COUNT,
> > call it during vkDraw* commands), we could update the compression state
> > as part of updating the fast clear state. What benefit would we get from
> > allowing FC to be a non-zero value and COMP to be zero?
> >
>
> It's useful in GL because it lets us avoid any sort of resolve when going
> from CCS_E to CCS_D. In Vulkan, maybe it's not useful.
>
>
Thanks for mentioning how GL benefits. I looked at the state diagram in
isl.h to understand why we had a PARTIAL_CLEAR state, but couldn't
figure it out.
> > > > Here's how I understand the states could look:
> > > > FC | COMP
> > > > ==========
> > > > 0 | 0
> > > > 0 | 1
> > > > 1 | 1
> > > > 2 | 1
> > > >
> > > > If that's the case we only have to check the compression_state when a
> > > > decompress is called for. If we moved the "if (decompress) {}" block of
> > > > the function higher we wouldn't need to OR the compression predicate
> > > > with the fast clear predicate.
> > > >
> > >
> > > Ugh... This is subtle... If the above table is accurate (which, as I
> > said
> > > above, I'm not convinced it is), then I think your statement there is
> > > correct.
> > >
> > > I'm a bit torn. On the one hand, this code is complicated and we could
> > > simplify it if we're willing to make the assumption above. On the other
> > > hand, there are a bunch of subtle interactions here and I don't like
> > subtle.
> >
>
> As much as I hate to admit it, I think you may be starting to convince me.
> :-) As long as setting fast clear state also sets compressed state (so
> they never get out-of-sync), I think we can go ahead and make that
> assumption. I'll see how it simplifies the code.
>
Yay :)
> --Jason
>
>
> > > > > > > + const int image_comp = next_reg++;
> > > > > > > + anv_batch_emit(&cmd_buffer->batch,
> > > > > > GENX(MI_LOAD_REGISTER_MEM), lrm) {
> > > > > > > + lrm.RegisterAddress = CS_GPR(image_comp);
> > > > > > > + lrm.MemoryAddress = compression_state_addr;
> > > > > > > + }
> > > > > > > +
> > > > > > > + dw = anv_batch_emitn(&cmd_buffer->batch, 5,
> > > > GENX(MI_MATH));
> > > > > > > + dw[1] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCA, pred_reg);
> > > > > > > + dw[2] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCB,
> > image_comp);
> > > > > > > + dw[3] = mi_alu(MI_ALU_OR, 0, 0);
> > > > > > > + dw[4] = mi_alu(MI_ALU_STORE, pred_reg, MI_ALU_ACCU);
> > > > > > > + }
> > > > > > > +
> > > > > > > + anv_batch_emit(&cmd_buffer->batch,
> > > > GENX(MI_STORE_DATA_IMM),
> > > > > > sdi) {
> > > > > > > + sdi.Address = compression_state_addr;
> > > > > > > + sdi.ImmediateData = 0;
> > > > > > > + }
> > > > > > > + }
> > > > > > >
> > > > > > > - const struct anv_address resolve_flag_addr =
> > > > > > > - anv_image_get_needs_resolve_addr(cmd_buffer->device,
> > > > > > > - image, aspect, level);
> > > > > > > + /* Store the predicate */
> > > > > > > + assert(pred_reg != -1);
> > > > > > > + emit_lrr(&cmd_buffer->batch, MI_PREDICATE_SRC0,
> > > > CS_GPR(pred_reg));
> > > > > > >
> > > > > > > - /* Make the pending predicated resolve a no-op if one is not
> > > > needed.
> > > > > > > - * predicate = do_resolve = resolve_flag != 0;
> > > > > > > + /* If the predicate is true, we want to write 0 to the
> > fast
> > > > clear
> > > > > > type
> > > > > > > + * and, if it's false, leave it alone. We can do this by
> > > > writing
> > > > > > > + *
> > > > > > > + * clear_type = clear_type & ~predicate;
> > > > > > > + */
> > > > > > > + dw = anv_batch_emitn(&cmd_buffer->batch, 5,
> > GENX(MI_MATH));
> > > > > > > + dw[1] = mi_alu(MI_ALU_LOAD, MI_ALU_SRCA, image_fc);
> > > > > > > + dw[2] = mi_alu(MI_ALU_LOADINV, MI_ALU_SRCB, pred_reg);
> > > > > > > + dw[3] = mi_alu(MI_ALU_AND, 0, 0);
> > > > > > > + dw[4] = mi_alu(MI_ALU_STORE, image_fc, MI_ALU_ACCU);
> > > > > > > +
> > > > > > > + anv_batch_emit(&cmd_buffer->batch,
> > > > GENX(MI_STORE_REGISTER_MEM),
> > > > > > srm) {
> > > > > > > + srm.RegisterAddress = CS_GPR(image_fc);
> > > > > > > + srm.MemoryAddress = fast_clear_type_addr;
> > > > > > > + }
> > > > > > > + } else if (decompress) {
> > > > > > > + /* We're trying to get rid of compression but we don't
> > care
> > > > about
> > > > > > fast
> > > > > > > + * clears so all we need is the compression predicate.
> > > > > > > + */
> > > > > > > + assert(resolve_op == ISL_AUX_OP_FULL_RESOLVE);
> > > > > > > + struct anv_address compression_state_addr =
> > > > > > > + anv_image_get_compression_
> > state_addr(cmd_buffer->device,
> > > > > > image,
> > > > > > > + aspect, level,
> > > > > > array_layer);
> > > > > > > + anv_batch_emit(&cmd_buffer->batch,
> > > > GENX(MI_LOAD_REGISTER_MEM),
> > > > > > lrm) {
> > > > > > > + lrm.RegisterAddress = MI_PREDICATE_SRC0;
> > > > > > > + lrm.MemoryAddress = compression_state_addr;
> > > > > > > + }
> > > > > > > + anv_batch_emit(&cmd_buffer->batch,
> > GENX(MI_STORE_DATA_IMM),
> > > > sdi)
> > > > > > {
> > > > > > > + sdi.Address = compression_state_addr;
> > > > > > > + sdi.ImmediateData = 0;
> > > > > > > + }
> > > >
> > >
> > > If we want them decoupled, we probably want to do "compressed |=
> > predicate"
> > > here.
> > >
> > >
> > > > > > > + } else {
> > > > > > > + /* In this case, we're trying to do a partial resolve on a
> > > > slice
> > > > > > that
> > > > > > > + * doesn't have clear color. There's nothing to do.
> > > > > > > + */
> > > > > > > + return;
> > > > > > > + }
> > > > > > > +
> > > > > > > +#else /* GEN_GEN <= 8 */
> > > > > > > + assert(resolve_op == ISL_AUX_OP_FULL_RESOLVE);
> > > > > > > + assert(fast_clear_supported != ANV_FAST_CLEAR_ANY);
> > > > > > > +
> > > > > > > + /* We don't support fast clears on anything other than the
> > first
> > > > > > slice. */
> > > > > > > + if (level > 0 || array_layer > 0)
> > > > > > > + return;
> > > > > > > +
> > > > > > > + /* On gen8, we don't have a concept of default clear colors
> > > > because
> > > > > > we
> > > > > > > + * can't sample from CCS surfaces. It's enough to just load
> > the
> > > > > > fast clear
> > > > > > > + * state into the predicate register.
> > > > > > > */
> > > > > > > + anv_batch_emit(&cmd_buffer->batch,
> > GENX(MI_LOAD_REGISTER_MEM),
> > > > lrm)
> > > > > > {
> > > > > > > + lrm.RegisterAddress = MI_PREDICATE_SRC0;
> > > > > > > + lrm.MemoryAddress = fast_clear_type_addr;
> > > > > > > + }
> > > > > > > + anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_DATA_IMM),
> > > > sdi) {
> > > > > > > + sdi.Address = fast_clear_type_addr;
> > > > > > > + sdi.ImmediateData = 0;
> > > > > > > + }
> > > > > > > +#endif
> > > > > > > +
> > > > > > > + /* We use the first half of src0 for the actual predicate.
> > Set
> > > > the
> > > > > > second
> > > > > > > + * half of src0 and all of src1 to 0 as the predicate
> > operation
> > > > will
> > > > > > be
> > > > > > > + * doing an implicit src0 != src1.
> > > > > > > + */
> > > > > > > + emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4, 0);
> > > > > > > emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1 , 0);
> > > > > > > emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1 + 4, 0);
> > > > > > > - emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0 , 0);
> > > > > > > - emit_lrm(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4,
> > > > > > > - resolve_flag_addr.bo, resolve_flag_addr.offset);
> > > > > > > +
> > > > > > > anv_batch_emit(&cmd_buffer->batch, GENX(MI_PREDICATE), mip)
> > {
> > > > > > > mip.LoadOperation = LOAD_LOADINV;
> > > > > > > mip.CombineOperation = COMBINE_SET;
> > > > > > > mip.CompareOperation = COMPARE_SRCS_EQUAL;
> > > > > > > }
> > > > > > > +
> > > > > > > + anv_image_ccs_op(cmd_buffer, image, aspect, level,
> > > > > > > + array_layer, 1, resolve_op, true);
> > > > > > > }
> > > > > > >
> > > > > > > void
> > > > > > > @@ -490,17 +652,30 @@ genX(cmd_buffer_mark_image_written)(struct
> > > > > > anv_cmd_buffer *cmd_buffer,
> > > > > > > {
> > > > > > > /* The aspect must be exactly one of the image aspects. */
> > > > > > > assert(_mesa_bitcount(aspect) == 1 && (aspect &
> > image->aspects));
> > > > > > > +
> > > > > > > + /* The only compression types with more than just
> > fast-clears are
> > > > > > MCS,
> > > > > > > + * CCS_E, and HiZ. With HiZ we just trust the layout and
> > don't
> > > > > > actually
> > > > > > > + * track the current fast-clear and compression state. This
> > > > leaves
> > > > > > us
> > > > > > > + * with just MCS and CCS_E.
> > > > > > > + */
> > > > > > > + if (aux_usage != ISL_AUX_USAGE_CCS_E &&
> > > > > > > + aux_usage != ISL_AUX_USAGE_MCS)
> > > > > > > + return;
> > > > > > > +
> > > > > > > + set_image_compressed_bit(cmd_buffer, image, aspect,
> > > > > > > + level, base_layer, layer_count,
> > true);
> > > > > > > }
> > > > > > >
> > > > > > > static void
> > > > > > > -init_fast_clear_state_entry(struct anv_cmd_buffer *cmd_buffer,
> > > > > > > - const struct anv_image *image,
> > > > > > > - VkImageAspectFlagBits aspect,
> > > > > > > - unsigned level)
> > > > > > > +init_fast_clear_color(struct anv_cmd_buffer *cmd_buffer,
> > > > > > > + const struct anv_image *image,
> > > > > > > + VkImageAspectFlagBits aspect)
> > > > > > > {
> > > > > > > assert(cmd_buffer && image);
> > > > > > > assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV);
> > > > > > > - assert(level < anv_image_aux_levels(image, aspect));
> > > > > > > +
> > > > > > > + set_image_fast_clear_state(cmd_buffer, image, aspect,
> > > > > > > + ANV_FAST_CLEAR_NONE);
> > > > > > >
> > > > > > > uint32_t plane = anv_image_aspect_to_plane(image->aspects,
> > > > aspect);
> > > > > > > enum isl_aux_usage aux_usage = image->planes[plane].aux_
> > usage;
> > > > > > > @@ -517,7 +692,7 @@ init_fast_clear_state_entry(struct
> > > > anv_cmd_buffer
> > > > > > *cmd_buffer,
> > > > > > > * values in the clear value dword(s).
> > > > > > > */
> > > > > > > struct anv_address addr =
> > > > > > > - anv_image_get_clear_color_addr(cmd_buffer->device, image,
> > > > > > aspect, level);
> > > > > > > + anv_image_get_clear_color_addr(cmd_buffer->device, image,
> > > > > > aspect);
> > > > > > > unsigned i = 0;
> > > > > > > for (; i < cmd_buffer->device->isl_dev.ss.clear_value_size;
> > i
> > > > += 4)
> > > > > > {
> > > > > > > anv_batch_emit(&cmd_buffer->batch,
> > GENX(MI_STORE_DATA_IMM),
> > > > sdi)
> > > > > > {
> > > > > > > @@ -558,19 +733,17 @@ genX(copy_fast_clear_dwords)(struct
> > > > > > anv_cmd_buffer *cmd_buffer,
> > > > > > > struct anv_state surface_state,
> > > > > > > const struct anv_image *image,
> > > > > > > VkImageAspectFlagBits aspect,
> > > > > > > - unsigned level,
> > > > > > > bool copy_from_surface_state)
> > > > > > > {
> > > > > > > assert(cmd_buffer && image);
> > > > > > > assert(image->aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV);
> > > > > > > - assert(level < anv_image_aux_levels(image, aspect));
> > > > > > >
> > > > > > > struct anv_bo *ss_bo =
> > > > > > > &cmd_buffer->device->surface_state_pool.block_pool.bo;
> > > > > > > uint32_t ss_clear_offset = surface_state.offset +
> > > > > > > cmd_buffer->device->isl_dev.ss.clear_value_offset;
> > > > > > > const struct anv_address entry_addr =
> > > > > > > - anv_image_get_clear_color_addr(cmd_buffer->device, image,
> > > > > > aspect, level);
> > > > > > > + anv_image_get_clear_color_addr(cmd_buffer->device, image,
> > > > > > aspect);
> > > > > > > unsigned copy_size = cmd_buffer->device->isl_dev.
> > > > > > ss.clear_value_size;
> > > > > > >
> > > > > > > if (copy_from_surface_state) {
> > > > > > > @@ -660,18 +833,6 @@ transition_color_buffer(struct
> > anv_cmd_buffer
> > > > > > *cmd_buffer,
> > > > > > > if (base_layer >= anv_image_aux_layers(image, aspect,
> > > > base_level))
> > > > > > > return;
> > > > > > >
> > > > > > > - /* A transition of a 3D subresource works on all slices at a
> > > > time. */
> > > > > > > - if (image->type == VK_IMAGE_TYPE_3D) {
> > > > > > > - base_layer = 0;
> > > > > > > - layer_count = anv_minify(image->extent.depth,
> > base_level);
> > > > > > > - }
> > > > > > > -
> > > > > > > - /* We're interested in the subresource range subset that has
> > aux
> > > > > > data. */
> > > > > > > - level_count = MIN2(level_count, anv_image_aux_levels(image,
> > > > aspect)
> > > > > > - base_level);
> > > > > > > - layer_count = MIN2(layer_count,
> > > > > > > - anv_image_aux_layers(image, aspect,
> > > > base_level) -
> > > > > > base_layer);
> > > > > > > - last_level_num = base_level + level_count;
> > > > > > > -
> > > > > > > assert(image->tiling == VK_IMAGE_TILING_OPTIMAL);
> > > > > > >
> > > > > > > if (initial_layout == VK_IMAGE_LAYOUT_UNDEFINED ||
> > > > > > > @@ -684,8 +845,8 @@ transition_color_buffer(struct anv_cmd_buffer
> > > > > > *cmd_buffer,
> > > > > > > *
> > > > > > > * Initialize the relevant clear buffer entries.
> > > > > > > */
> > > > > > > - for (unsigned level = base_level; level < last_level_num;
> > > > level++)
> > > > > > > - init_fast_clear_state_entry(cmd_buffer, image, aspect,
> > > > level);
> > > > > > > + if (base_level == 0 && base_layer == 0)
> > > > > > > + init_fast_clear_color(cmd_buffer, image, aspect);
> > > > > > >
> > > > > > > /* Initialize the aux buffers to enable correct
> > rendering. In
> > > > > > order to
> > > > > > > * ensure that things such as storage images work
> > correctly,
> > > > aux
> > > > > > buffers
> > > > > > > @@ -723,13 +884,18 @@ transition_color_buffer(struct
> > anv_cmd_buffer
> > > > > > *cmd_buffer,
> > > > > > > if (image->samples == 1) {
> > > > > > > for (uint32_t l = 0; l < level_count; l++) {
> > > > > > > const uint32_t level = base_level + l;
> > > > > > > - const uint32_t level_layer_count =
> > > > > > > + uint32_t level_layer_count =
> > > > > > > MIN2(layer_count, anv_image_aux_layers(image,
> > aspect,
> > > > > > level));
> > > > > > > +
> > > > > > > anv_image_ccs_op(cmd_buffer, image, aspect, level,
> > > > > > > base_layer, level_layer_count,
> > > > > > > ISL_AUX_OP_AMBIGUATE, false);
> > > > > >
> > > > > > On gen8, we're doing extra ambiguates when the level is greater
> > than 0.
> > > > > >
> > > > >
> > > > > Only if we never render to it with CCS_D enabled on miplevels > 0.
> > I'm
> > > > not
> > > > > sure if that's guaranteed by the current code.
> > > > >
> > > >
> > > > You're right. Could you put a perf_warn for rendering with CCS_D
> > enabled
> > > > on miplevels > 0 for gen8 in the patch,
> > > > "anv: Only fast clear single-slice images"?
> > > >
> > >
> > > We already warn for everything that isn't just single-slice rendering to
> > > the first slice. We could make a special warning for gen8+ but I don't
> > > really see a reason.
> > >
> >
> > Where do we do this? As far as I know, we only warn for fast-clears that
> > get turned off due to the target not being the first slice.
> >
Thinking about it some more, I suppose it doesn't matter as much as I
thought. If the API is going to render to a miplevel > 1, they will
likely fast clear it first. The only case we won't catch (due to the
current code structuring) is when the user does a non-zero-one clear.
Given the data from our testing thus far, the probability of such a
thing happening is likely low.
-Nanley
> > > --Jason
> > >
> > >
> > > > -Nanley
> > > >
> > > > > --Jason
> > > > >
> > > > >
> > > > > > -Nanley
> > > > > >
> > > > > > > - genX(set_image_needs_resolve)(cmd_buffer, image,
> > > > > > > - aspect, level, false);
> > > > > > > +
> > > > > > > + if (image->planes[plane].aux_usage ==
> > > > ISL_AUX_USAGE_CCS_E)
> > > > > > {
> > > > > > > + set_image_compressed_bit(cmd_buffer, image,
> > aspect,
> > > > > > > + level, base_layer,
> > > > > > level_layer_count,
> > > > > > > + false);
> > > > > > > + }
> > > > > > > }
> > > > > > > } else {
> > > > > > > if (image->samples == 4 || image->samples == 16) {
> > > > > > > @@ -812,19 +978,17 @@ transition_color_buffer(struct
> > anv_cmd_buffer
> > > > > > *cmd_buffer,
> > > > > > > cmd_buffer->state.pending_pipe_bits |=
> > > > > > > ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT |
> > > > ANV_PIPE_CS_STALL_BIT;
> > > > > > >
> > > > > > > - for (uint32_t level = base_level; level < last_level_num;
> > > > level++) {
> > > > > > > + for (uint32_t l = 0; l < level_count; l++) {
> > > > > > > + uint32_t level = base_level + l;
> > > > > > > + uint32_t level_layer_count =
> > > > > > > + MIN2(layer_count, anv_image_aux_layers(image, aspect,
> > > > level));
> > > > > > >
> > > > > > > - /* The number of layers changes at each 3D miplevel. */
> > > > > > > - if (image->type == VK_IMAGE_TYPE_3D) {
> > > > > > > - layer_count = MIN2(layer_count,
> > anv_image_aux_layers(image,
> > > > > > aspect, level));
> > > > > > > + for (uint32_t a = 0; a < level_layer_count; a++) {
> > > > > > > + uint32_t array_layer = base_layer + a;
> > > > > > > + anv_cmd_predicated_ccs_resolve(cmd_buffer, image,
> > aspect,
> > > > > > > + level, array_layer,
> > > > resolve_op,
> > > > > > > + final_fast_clear);
> > > > > > > }
> > > > > > > -
> > > > > > > - genX(load_needs_resolve_predicate)(cmd_buffer, image,
> > aspect,
> > > > > > level);
> > > > > > > -
> > > > > > > - anv_image_ccs_op(cmd_buffer, image, aspect, level,
> > > > > > > - base_layer, layer_count, resolve_op,
> > true);
> > > > > > > -
> > > > > > > - genX(set_image_needs_resolve)(cmd_buffer, image, aspect,
> > > > level,
> > > > > > false);
> > > > > > > }
> > > > > > >
> > > > > > > cmd_buffer->state.pending_pipe_bits |=
> > > > > > > @@ -1488,12 +1652,20 @@ void genX(CmdPipelineBarrier)(
> > > > > > > anv_image_expand_aspects(image, range->aspectMask);
> > > > > > > uint32_t aspect_bit;
> > > > > > >
> > > > > > > + uint32_t base_layer, layer_count;
> > > > > > > + if (image->type == VK_IMAGE_TYPE_3D) {
> > > > > > > + base_layer = 0;
> > > > > > > + layer_count = anv_minify(image->extent.depth,
> > > > > > range->baseMipLevel);
> > > > > > > + } else {
> > > > > > > + base_layer = range->baseArrayLayer;
> > > > > > > + layer_count = anv_get_layerCount(image, range);
> > > > > > > + }
> > > > > > > +
> > > > > > > anv_foreach_image_aspect_bit(aspect_bit, image,
> > > > > > color_aspects) {
> > > > > > > transition_color_buffer(cmd_buffer, image, 1UL <<
> > > > > > aspect_bit,
> > > > > > > range->baseMipLevel,
> > > > > > > anv_get_levelCount(image,
> > > > range),
> > > > > > > - range->baseArrayLayer,
> > > > > > > - anv_get_layerCount(image,
> > > > range),
> > > > > > > + base_layer, layer_count,
> > > > > > > pImageMemoryBarriers[i].
> > > > oldLayout,
> > > > > > > pImageMemoryBarriers[i].
> > > > newLayout);
> > > > > > > }
> > > > > > > @@ -3203,28 +3375,26 @@ cmd_buffer_subpass_sync_fast_
> > > > clear_values(struct
> > > > > > anv_cmd_buffer *cmd_buffer)
> > > > > > > genX(copy_fast_clear_dwords)(cmd_buffer,
> > > > > > att_state->color.state,
> > > > > > > iview->image,
> > > > > > > VK_IMAGE_ASPECT_COLOR_BIT,
> > > > > > > - iview->planes[0].isl.base_
> > > > level,
> > > > > > > true /* copy from ss */);
> > > > > > >
> > > > > > > /* Fast-clears impact whether or not a resolve will be
> > > > > > necessary. */
> > > > > > > - if (iview->image->planes[0].aux_usage ==
> > > > ISL_AUX_USAGE_CCS_E
> > > > > > &&
> > > > > > > - att_state->clear_color_is_zero) {
> > > > > > > + if (att_state->clear_color_is_zero) {
> > > > > > > /* This image always has the auxiliary buffer
> > enabled.
> > > > We
> > > > > > can mark
> > > > > > > * the subresource as not needing a resolve because
> > the
> > > > > > clear color
> > > > > > > * will match what's in every RENDER_SURFACE_STATE
> > > > object
> > > > > > when it's
> > > > > > > * being used for sampling.
> > > > > > > */
> > > > > > > - genX(set_image_needs_resolve)(cmd_buffer,
> > iview->image,
> > > > > > > -
> > VK_IMAGE_ASPECT_COLOR_BIT,
> > > > > > > -
> > iview->planes[0].isl.base_
> > > > > > level,
> > > > > > > - false);
> > > > > > > + set_image_fast_clear_state(cmd_buffer,
> > iview->image,
> > > > > > > +
> > VK_IMAGE_ASPECT_COLOR_BIT,
> > > > > > > +
> > > > ANV_FAST_CLEAR_DEFAULT_VALUE);
> > > > > > > } else {
> > > > > > > - genX(set_image_needs_resolve)(cmd_buffer,
> > iview->image,
> > > > > > > -
> > VK_IMAGE_ASPECT_COLOR_BIT,
> > > > > > > -
> > iview->planes[0].isl.base_
> > > > > > level,
> > > > > > > - true);
> > > > > > > + set_image_fast_clear_state(cmd_buffer,
> > iview->image,
> > > > > > > +
> > VK_IMAGE_ASPECT_COLOR_BIT,
> > > > > > > + ANV_FAST_CLEAR_ANY);
> > > > > > > }
> > > > > > > - } else if (rp_att->load_op == VK_ATTACHMENT_LOAD_OP_LOAD)
> > {
> > > > > > > + } else if (rp_att->load_op == VK_ATTACHMENT_LOAD_OP_LOAD
> > &&
> > > > > > > + iview->planes[0].isl.base_level == 0 &&
> > > > > > > + iview->planes[0].isl.base_array_layer == 0) {
> > > > > > > /* The attachment may have been fast-cleared in a
> > previous
> > > > > > render
> > > > > > > * pass and the value is needed now. Update the surface
> > > > > > state(s).
> > > > > > > *
> > > > > > > @@ -3233,7 +3403,6 @@ cmd_buffer_subpass_sync_fast_
> > > > clear_values(struct
> > > > > > anv_cmd_buffer *cmd_buffer)
> > > > > > > genX(copy_fast_clear_dwords)(cmd_buffer,
> > > > > > att_state->color.state,
> > > > > > > iview->image,
> > > > > > > VK_IMAGE_ASPECT_COLOR_BIT,
> > > > > > > - iview->planes[0].isl.base_
> > > > level,
> > > > > > > false /* copy to ss */);
> > > > > > >
> > > > > > > if (need_input_attachment_state(rp_att) &&
> > > > > > > @@ -3241,7 +3410,6 @@ cmd_buffer_subpass_sync_fast_
> > > > clear_values(struct
> > > > > > anv_cmd_buffer *cmd_buffer)
> > > > > > > genX(copy_fast_clear_dwords)(cmd_buffer,
> > > > > > att_state->input.state,
> > > > > > > iview->image,
> > > > > > >
> > VK_IMAGE_ASPECT_COLOR_BIT,
> > > > > > > -
> > iview->planes[0].isl.base_
> > > > > > level,
> > > > > > > false /* copy to ss
> > */);
> > > > > > > }
> > > > > > > }
> > > > > > > --
> > > > > > > 2.5.0.400.gff86faf
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > mesa-dev mailing list
> > > > > > > mesa-dev at lists.freedesktop.org
> > > > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > > > >
> > > >
> >
More information about the mesa-dev
mailing list