<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Jul 14, 2017 at 2:42 AM, Nanley Chery <span dir="ltr"><<a href="mailto:nanleychery@gmail.com" target="_blank">nanleychery@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Mon, Jul 10, 2017 at 10:14:21AM -0700, Jason Ekstrand wrote:<br>
> On Wed, Jun 28, 2017 at 2:14 PM, Nanley Chery <<a href="mailto:nanleychery@gmail.com">nanleychery@gmail.com</a>> wrote:<br>
><br>
> > v2: Expound on comment for the pipe controls (Jason Ekstrand).<br>
> ><br>
> > Signed-off-by: Nanley Chery <<a href="mailto:nanley.g.chery@intel.com">nanley.g.chery@intel.com</a>><br>
> > ---<br>
> > src/intel/vulkan/anv_blorp.c | 4 +-<br>
> > src/intel/vulkan/genX_cmd_<wbr>buffer.c | 183 ++++++++++++++++++++++++++++++<br>
> > +++----<br>
> > 2 files changed, 167 insertions(+), 20 deletions(-)<br>
> ><br>
> > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c<br>
> > index 459d57ec57..84b01e8792 100644<br>
> > --- a/src/intel/vulkan/anv_blorp.c<br>
> > +++ b/src/intel/vulkan/anv_blorp.c<br>
> > @@ -1451,7 +1451,9 @@ anv_image_ccs_clear(struct anv_cmd_buffer<br>
> > *cmd_buffer,<br>
> ><br>
> > struct blorp_surf surf;<br>
> > get_blorp_surf_for_anv_image(<wbr>image, VK_IMAGE_ASPECT_COLOR_BIT,<br>
> > - image->aux_usage, &surf);<br>
> > + image->aux_usage == ISL_AUX_USAGE_CCS_E ?<br>
> > + ISL_AUX_USAGE_CCS_E : ISL_AUX_USAGE_CCS_D,<br>
> > + &surf);<br>
> ><br>
> > /* From the Sky Lake PRM Vol. 7, "Render Target Fast Clear":<br>
> > *<br>
> > diff --git a/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> > b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> > index decf0b28d6..1a9b841c7c 100644<br>
> > --- a/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> > +++ b/src/intel/vulkan/genX_cmd_<wbr>buffer.c<br>
> > @@ -524,6 +524,17 @@ genX(copy_fast_clear_dwords)(<wbr>struct anv_cmd_buffer<br>
> > *cmd_buffer,<br>
> > }<br>
> > }<br>
> ><br>
> > +/**<br>
> > + * @brief Transitions a color buffer from one layout to another.<br>
> > + *<br>
> > + * See section 6.1.1. Image Layout Transitions of the Vulkan 1.0.50 spec<br>
> > for<br>
> > + * more information.<br>
> > + *<br>
> > + * @param level_count VK_REMAINING_MIP_LEVELS isn't supported.<br>
> > + * @param layer_count VK_REMAINING_ARRAY_LAYERS isn't supported. For 3D<br>
> > images,<br>
> > + * this represents the maximum layers to transition at<br>
> > each<br>
> > + * specified miplevel.<br>
> > + */<br>
> > static void<br>
> > transition_color_buffer(struct anv_cmd_buffer *cmd_buffer,<br>
> > const struct anv_image *image,<br>
> > @@ -532,13 +543,27 @@ transition_color_buffer(struct anv_cmd_buffer<br>
> > *cmd_buffer,<br>
> > VkImageLayout initial_layout,<br>
> > VkImageLayout final_layout)<br>
> > {<br>
> > - assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);<br>
> > -<br>
> > - if (image->aux_surface.isl.size == 0)<br>
> > - return;<br>
> > -<br>
> > - if (initial_layout != VK_IMAGE_LAYOUT_UNDEFINED &&<br>
> > - initial_layout != VK_IMAGE_LAYOUT_<wbr>PREINITIALIZED)<br>
> > + /* Validate the inputs. */<br>
> > + assert(cmd_buffer);<br>
> > + assert(image && image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);<br>
> > + /* These values aren't supported for simplicity's sake. */<br>
> > + assert(level_count != VK_REMAINING_MIP_LEVELS &&<br>
> > + layer_count != VK_REMAINING_ARRAY_LAYERS);<br>
> > + /* Ensure the subresource range is valid. */<br>
> > + uint64_t last_level_num = base_level + level_count;<br>
> > + const uint32_t max_depth = anv_minify(image->extent.<wbr>depth,<br>
> > base_level);<br>
> > + const uint32_t image_layers = MAX2(image->array_size, max_depth);<br>
> > + assert(base_layer + layer_count <= image_layers);<br>
> > + assert(last_level_num <= image->levels);<br>
> > + /* The spec disallows these final layouts. */<br>
> > + assert(final_layout != VK_IMAGE_LAYOUT_UNDEFINED &&<br>
> > + final_layout != VK_IMAGE_LAYOUT_<wbr>PREINITIALIZED);<br>
> > +<br>
> > + /* No work is necessary if the layout stays the same or if this<br>
> > subresource<br>
> > + * range lacks auxiliary data.<br>
> > + */<br>
> > + if (initial_layout == final_layout ||<br>
> > + base_layer >= anv_image_aux_layers(image, base_level))<br>
> > return;<br>
> ><br>
> > /* A transition of a 3D subresource works on all slices at a time. */<br>
> > @@ -549,22 +574,142 @@ transition_color_buffer(struct anv_cmd_buffer<br>
> > *cmd_buffer,<br>
> ><br>
> > /* We're interested in the subresource range subset that has aux data.<br>
> > */<br>
> > level_count = MIN2(level_count, anv_image_aux_levels(image));<br>
> > + layer_count = MIN2(layer_count, anv_image_aux_layers(image,<br>
> > base_level));<br>
> ><br>
><br>
> Is this correct? I think we want MIN2(layer_count, anv_image_aux_layers()<br>
> - base_layer), don't we? This would also mean there's a bug in the current<br>
> level_count.<br>
><br>
><br>
<br>
</div></div>I see nothing wrong with it. The current equation limits the<br>
user-specified the layer_count to the actual number of layers which can<br>
be transitioned. What's the meaning behind subtracting base_layer?<br><div><div class="h5"></div></div></blockquote><div><br></div><div>Because anv_image_aux_layers is the number of layers relative to 0 not relative to base_layer. The layer_count parameter, on the other hand, is relative to base_layer. Put differently, the maximum layer they will touch is base_layer + layer_count - 1 where the max layer in the image is anv_image_aux_layers() - 1. So, if we're going to clamp layer_count, we need to clamp it to anv_imagae_aux_layers() - base_layer.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">
> > + last_level_num = base_level + level_count;<br>
> > +<br>
> > + /* Record whether or not the layout is undefined. Pre-initialized<br>
> > images<br>
> > + * with auxiliary buffers have a non-linear layout and are thus<br>
> > undefined.<br>
> > + */<br>
> > + assert(image->tiling == VK_IMAGE_TILING_OPTIMAL);<br>
> > + const bool undef_layout = initial_layout == VK_IMAGE_LAYOUT_UNDEFINED<br>
> > ||<br>
> > + initial_layout == VK_IMAGE_LAYOUT_<br>
> > PREINITIALIZED;<br>
> ><br>
> > - /* We're transitioning from an undefined layout. We must ensure that<br>
> > the<br>
> > - * clear values buffer is filled with valid data.<br>
> > + /* Do preparatory work before the resolve operation or return early if<br>
> > no<br>
> > + * resolve is actually needed.<br>
> > */<br>
> > - for (unsigned l = 0; l < level_count; l++)<br>
> > - init_fast_clear_state_entry(<wbr>cmd_buffer, image, base_level + l);<br>
> > -<br>
> > - if (image->aux_usage == ISL_AUX_USAGE_CCS_E) {<br>
> > - /* We're transitioning from an undefined layout so it doesn't really<br>
> > - * matter what data ends up in the color buffer. We do, however,<br>
> > need to<br>
> > - * ensure that the CCS has valid data in it. One easy way to do<br>
> > that is<br>
> > - * to fast-clear the specified range.<br>
> > + if (undef_layout) {<br>
> > + /* A subresource in the undefined layout may have been aliased and<br>
> > + * populated with any arrangement of bits. Therefore, we must<br>
> > initialize<br>
> > + * the related aux buffer and clear buffer entry with desirable<br>
> > values.<br>
> > + *<br>
> > + * Initialize the relevant clear buffer entries.<br>
> > */<br>
> > - anv_image_ccs_clear(cmd_<wbr>buffer, image, base_level, level_count,<br>
> > - base_layer, layer_count);<br>
> > + for (unsigned level = base_level; level < last_level_num; level++)<br>
> > + init_fast_clear_state_entry(<wbr>cmd_buffer, image, level);<br>
> > +<br>
> > + /* MCS buffers have been initialized for correct behaviour. */<br>
> ><br>
><br>
> I don't think this is correct. I can definitely see MCS going off the<br>
> rails in a similar way to CCS if things aren't initialized correctly.<br>
> Unfortunately, MSAA test coverage is defintiely sub-par right now. I think<br>
> I'd feel more comfortable if we fast-cleared them too. Fortunately,<br>
> though, we always leave basic MCS on for MSAA surfaces so no resolve will<br>
> ever be needed.<br>
><br>
><br>
<br>
</div></div>That's true. 2x and 8x MCS surfaces require that some of the upper bits<br>
be zero, so they be should be fast-cleared as well. I'll make a separate<br>
patch for fast-clearing MCS and CC it to stable.<br>
<div><div class="h5"><br>
> > + if (image->samples > 1)<br>
> > + return;<br>
> > +<br>
> > + /* Initialize the CCS buffers to enable correct rendering. This<br>
> > operation<br>
> > + * requires up to two steps: one to rid the CCS buffer of data that<br>
> > may<br>
> > + * cause GPU hangs, and another to ensure that writes done without<br>
> > CCS<br>
> > + * will be visible to reads done with CCS.<br>
> > + *<br>
> > + * On SKL+, it becomes possible to have a CCS buffer with invalid<br>
> > data.<br>
> > + * One easy way to avoid this state is to fast-clear the specified<br>
> > range.<br>
> > + */<br>
> > + if (GEN_GEN >= 9) {<br>
> > + anv_image_ccs_clear(cmd_<wbr>buffer, image, base_level, level_count,<br>
> > + base_layer, layer_count);<br>
> > + }<br>
> > + /* At this point, some elements of the CCS buffer may have the<br>
> > fast-clear<br>
> > + * bit-arrangement. As the user writes to a subresource, we need to<br>
> > have<br>
> > + * the associated CCS elements enter the ambiguated state. This<br>
> > enables<br>
> > + * reads (implicit or explicit) to reflect the user-written data<br>
> > instead<br>
> > + * of the clear color. The only time such elements will not change<br>
> > their<br>
> > + * state as described above, is in a final layout that doesn't have<br>
> > CCS<br>
> > + * enabled. In this case, we must force the associated CCS buffers<br>
> > of the<br>
> > + * specified range to enter the ambiguated state in advance.<br>
> > + */<br>
> > + if (image->aux_usage != ISL_AUX_USAGE_CCS_E &&<br>
> > + final_layout != VK_IMAGE_LAYOUT_COLOR_<wbr>ATTACHMENT_OPTIMAL) {<br>
> > + /* The CCS_D buffer may not be enabled in the final layout.<br>
> > Continue<br>
> > + * executing this function to perform a resolve.<br>
> > + */<br>
> > + anv_perf_warn("Performing an additional resolve for CCS_D<br>
> > layout "<br>
> > + "transition. Consider always leaving it on or "<br>
> > + "performing an ambiguation pass.");<br>
> > + } else {<br>
> > + /* Writes in the final layout will be aware of the CCS buffer. In<br>
> > + * addition, the clear buffer entries and the CCS buffers have<br>
> > been<br>
> > + * populated with values that will result in correct rendering.<br>
> > + */<br>
> > + return;<br>
> ><br>
><br>
> Ugh... This all works but it makes me an even bigger fan of that ambiguate<br>
> pass... Maybe we should consider reviving it once this all lands.<br>
><br>
><br>
<br>
</div></div>I can see why. It would simplify this chunk of code a lot.<br>
<div><div class="h5"><br>
> > + }<br>
> > + } else if (initial_layout != VK_IMAGE_LAYOUT_COLOR_<wbr>ATTACHMENT_OPTIMAL)<br>
> > {<br>
> > + /* Resolves are only necessary if the subresource may contain blocks<br>
> > + * fast-cleared to values unsupported in other layouts. This only<br>
> > occurs<br>
> > + * if the initial layout is COLOR_ATTACHMENT_OPTIMAL.<br>
> > + */<br>
> > + return;<br>
> > + } else if (image->samples > 1) {<br>
> > + /* MCS buffers don't need resolving. */<br>
> > + return;<br>
> > }<br>
> > +<br>
> > + /* Perform a resolve to synchronize data between the main and aux<br>
> > buffer.<br>
> > + * Before we begin, we must satisfy the cache flushing requirement<br>
> > specified<br>
> > + * in the Sky Lake PRM Vol. 7, "MCS Buffer for Render Target(s)":<br>
> > + *<br>
> > + * Any transition from any value in {Clear, Render, Resolve} to a<br>
> > + * different value in {Clear, Render, Resolve} requires end of pipe<br>
> > + * synchronization.<br>
> > + *<br>
> > + * We perform a flush of the write cache before and after the clear and<br>
> > + * resolve operations to meet this requirement.<br>
> > + *<br>
> > + * Unlike other drawing, it seems that fast clear operations are not<br>
> > + * properly synchronized. The first PIPE_CONTROL here likely ensures<br>
> > that<br>
> > + * the contents of the previous render or clear hit the render target<br>
> > before<br>
> > + * we resolve and the second likely ensures that the resolve is<br>
> > complete<br>
> > + * before we do any more rendering or clearing.<br>
> ><br>
><br>
> I'm really not a fan of how much this has been weasel-worded...<br>
><br>
> <rant><br>
> One of the big problems with the PRMs is that they don't actually describe<br>
> the hardware. They describe how a theoretical driver can/should program<br>
> the hardware to get correct rendering results. What they don't tell you is<br>
> *why* programming the hardware that way is needed. They get especially bad<br>
> about this around synchronization and image layout. They're also<br>
> frequently missing important details and are sometimes flat-out wrong.<br>
><br>
> The statement "fast clear ops are not properly synchronized with other<br>
> drawing" describes, in some sense, what's going on in the hardware. You<br>
> have two rendering operations in-flight and, due to some hardware detail I<br>
> don't fully understand, they can't safely be in the pipeline at the same<br>
> time. It's not a full description I'll grant, but it at least describes,<br>
> in general terms, the problem the workaround is solving. The language in<br>
> the PRM, on the other hand, describes some engineer's mental model of that<br>
> synchronization problem and a way of working around it.<br>
><br>
> We write drivers for hardware, not for PRMs. I don't think we need to<br>
> weasel-word our comments to make it look like all we have is hair-brained<br>
> theories while whoever wrote the PRM note has the actual knowledge. If<br>
> we're particularly unsure about something then that should be expressed in<br>
> the comment. However, I'm pretty convinced that what's going on here is a<br>
> synchronization problem. Our understanding of fast-clear operations may<br>
> improve in the future and, if/when it does, we'll update the comments.<br>
> </rant><br>
><br>
><br>
<br>
</div></div>Thanks for sharing that point of view. I hadn't really thought that much<br>
about how the PRMs were created. I agree that I can at least remove the<br>
word "seems" from the comment while still being accurate.<br>
<div class="HOEnZb"><div class="h5"><br>
> > + */<br>
> > + cmd_buffer->state.pending_<wbr>pipe_bits |=<br>
> > + ANV_PIPE_RENDER_TARGET_CACHE_<wbr>FLUSH_BIT | ANV_PIPE_CS_STALL_BIT;<br>
> > +<br>
> > + for (uint32_t level = base_level; level < last_level_num; level++) {<br>
> > +<br>
> > + /* The number of layers changes at each 3D miplevel. */<br>
> > + if (image->type == VK_IMAGE_TYPE_3D) {<br>
> > + layer_count = MIN2(layer_count, anv_image_aux_layers(image,<br>
> > level));<br>
> > + }<br>
> > +<br>
> > + /* Create a surface state with the right clear color and perform the<br>
> > + * resolve.<br>
> > + */<br>
> > + struct anv_state surface_state =<br>
> > + anv_cmd_buffer_alloc_surface_<wbr>state(cmd_buffer);<br>
> > + isl_surf_fill_state(&cmd_<wbr>buffer->device->isl_dev,<br>
> > surface_state.map,<br>
> > + .surf = &image->color_surface.isl,<br>
> > + .view = &(struct isl_view) {<br>
> > + .usage = ISL_SURF_USAGE_RENDER_TARGET_<wbr>BIT,<br>
> > + .format = image->color_surface.isl.<wbr>format,<br>
> > + .swizzle = ISL_SWIZZLE_IDENTITY,<br>
> > + .base_level = level,<br>
> > + .levels = 1,<br>
> > + .base_array_layer = base_layer,<br>
> > + .array_len = layer_count,<br>
> > + },<br>
> > + .aux_surf = &image->aux_surface.isl,<br>
> > + .aux_usage = image->aux_usage ==<br>
> > ISL_AUX_USAGE_NONE ?<br>
> > + ISL_AUX_USAGE_CCS_D :<br>
> > image->aux_usage,<br>
> > + .mocs = cmd_buffer->device->default_<wbr>mocs);<br>
> > + add_image_relocs(cmd_buffer, image, VK_IMAGE_ASPECT_COLOR_BIT,<br>
> > + image->aux_usage == ISL_AUX_USAGE_CCS_E ?<br>
> > + ISL_AUX_USAGE_CCS_E : ISL_AUX_USAGE_CCS_D,<br>
> > + surface_state);<br>
> > + anv_state_flush(cmd_buffer-><wbr>device, surface_state);<br>
> > + genX(copy_fast_clear_dwords)(<wbr>cmd_buffer, surface_state, image,<br>
> > level,<br>
> > + false /* copy to ss */);<br>
> > + anv_ccs_resolve(cmd_buffer, surface_state, image, level,<br>
> > layer_count,<br>
> > + image->aux_usage == ISL_AUX_USAGE_CCS_E ?<br>
> > + BLORP_FAST_CLEAR_OP_RESOLVE_<wbr>PARTIAL :<br>
> > + BLORP_FAST_CLEAR_OP_RESOLVE_<wbr>FULL);<br>
> > + }<br>
> > +<br>
> > + cmd_buffer->state.pending_<wbr>pipe_bits |=<br>
> > + ANV_PIPE_RENDER_TARGET_CACHE_<wbr>FLUSH_BIT | ANV_PIPE_CS_STALL_BIT;<br>
> > }<br>
> ><br>
> > /**<br>
> > --<br>
> > 2.13.1<br>
> ><br>
> > ______________________________<wbr>_________________<br>
> > mesa-dev mailing list<br>
> > <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> > <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
> ><br>
</div></div></blockquote></div><br></div></div>