[Mesa-dev] [PATCH] anv/gpu_memcpy: CS Stall before a MI memcpy on gen7

Tue Feb 20 19:52:38 UTC 2018

On Sat, Feb 17, 2018 at 1:42 AM, Jason Ekstrand <jason at jlekstrand.net>
wrote:

> On Sat, Feb 17, 2018 at 1:21 AM, Kenneth Graunke <kenneth at whitecape.org>
> wrote:
>
>> On Saturday, February 17, 2018 1:09:10 AM PST Jason Ekstrand wrote:
>> > This fixes a pile of hangs caused by the recent shuffling of resolves
>> > and transitions.  The particularly problematic case is when you have at
>> > least three attachments with load ops of CLEAR, LOAD, CLEAR.  In this
>> > case, we execute the first CLEAR followed by a MI memcpy to copy the
>> > clear values over for the LOAD followed by a second CLEAR.  The MI
>> > commands cause the first CLEAR to hang which causes us to get stuck on
>> > the 3DSTATE_MULTISAMPLE in the second CLEAR.
>> >
>> > We also add guards for BLORP to fix the same issue.  These shouldn't
>> > actually do anything right now because the only use of indirect clears
>> > in BLORP today is for resolves which are already guarded by a render
>> > cache flush and CS stall.  However, this will guard us against potential
>> > issues in the future.
>> >
>> > Cc: Kenneth Graunke <kenneth at whitecape.org>
>> > Cc: Nanley Chery <nanley.g.chery at intel.com>
>> > ---
>> >  src/intel/vulkan/genX_blorp_exec.c | 10 ++++++++++
>> >  src/intel/vulkan/genX_gpu_memcpy.c | 22 ++++++++++++++++++++++
>> >  2 files changed, 32 insertions(+)
>> >
>> > diff --git a/src/intel/vulkan/genX_blorp_exec.c
>> b/src/intel/vulkan/genX_blorp_exec.c
>> > index 04f7675..f956715 100644
>> > --- a/src/intel/vulkan/genX_blorp_exec.c
>> > +++ b/src/intel/vulkan/genX_blorp_exec.c
>> > @@ -200,6 +200,16 @@ genX(blorp_exec)(struct blorp_batch *batch,
>> >        genX(cmd_buffer_config_l3)(cmd_buffer, cfg);
>> >     }
>> >
>> > +#if GEN_GEN == 7
>> > +   /* The MI_LOAD/STORE_REGISTER_MEM commands which BLORP uses to
>> implement
>> > +    * indirect fast-clear colors can cause GPU hangs if we don't stall
>> first.
>> > +    * See genX(cmd_buffer_mi_memcpy) for more details.
>> > +    */
>> > +   assert(params->src.clear_color_addr.buffer == NULL);
>> > +   if (params->dst.clear_color_addr.buffer)
>> > +      cmd_buffer->state.pending_pipe_bits |= ANV_PIPE_CS_STALL_BIT;
>> > +#endif
>> > +
>> >     genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
>> >
>> >     genX(flush_pipeline_select_3d)(cmd_buffer);
>> > diff --git a/src/intel/vulkan/genX_gpu_memcpy.c
>> b/src/intel/vulkan/genX_gpu_memcpy.c
>> > index f3ada93..1d527fb 100644
>> > --- a/src/intel/vulkan/genX_gpu_memcpy.c
>> > +++ b/src/intel/vulkan/genX_gpu_memcpy.c
>> > @@ -62,6 +62,28 @@ genX(cmd_buffer_mi_memcpy)(struct anv_cmd_buffer
>> *cmd_buffer,
>> >     assert(dst_offset % 4 == 0);
>> >     assert(src_offset % 4 == 0);
>> >
>> > +#if GEN_GEN == 7
>> > +   /* On gen7, the combination of commands used
>> here(MI_LOAD_REGISTER_MEM
>> > +    * and MI_STORE_REGISTER_MEM) can cause GPU hangs if any rendering
>> is
>> > +    * in-flight when they are issued even if the memory touched is not
>> > +    * currently active for rendering.  The weird bit is that it is not
>> the
>> > +    * MI_LOAD/STORE_REGISTER_MEM commands which hang but rather the
>> in-flight
>> > +    * rendering hangs such that the next stalling command after the
>> > +    * MI_LOAD/STORE_REGISTER_MEM commands will catch the hang.
>> > +    *
>> > +    * It is unclear exactly why this hang occurs.  Both MI commands
>> come with
>> > +    * warnings about the 3D pipeline but that doesn't seem to fully
>> explain
>> > +    * it.  My (Jason's) best theory is that it has something to do
>> with the
>> > +    * fact that we're using a GPU state register as our temporary and
>> that
>> > +    * something with reading/writing it is causing problems.
>> > +    *
>> > +    * In order to work around this issue, we emit a PIPE_CONTROL with
>> the
>> > +    * command streamer stall bit set.
>> > +    */
>> > +   cmd_buffer->state.pending_pipe_bits |= ANV_PIPE_CS_STALL_BIT;
>> > +   genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
>> > +#endif
>> > +
>> >     for (uint32_t i = 0; i < size; i += 4) {
>> >        const struct anv_address src_addr =
>> >           (struct anv_address) { src, src_offset + i};
>> >
>>
>> This seems reasonable enough.  I can't say I understand why this is
>> necessary,
>
>
> That makes two of us and, when Nanley gets around to looking at it, I'm
> sure it'll make 3. :-)
>
>
>> but if it fixes the problem...it's probably best to just
>> do it.  We can always change it later if we find new information.
>>
>
> I'm fairly sure that this is more-or-less the correct fix and 10 Jenkins
> runs along with 50 (soon to be 100) runs of dEQP-VK.renderpass.dedicated_allocation.attachment.*
> agree.
>

At this point, that number is more like 250.  I'm now very sure this fixes
the problem.

> Acked-by: Kenneth Graunke <kenneth at whitecape.org>
>>
>
> Thanks!
>
>
>> I'll let Nanley review as well.
>>
>
> I can wait.  I've only been fighting this for a week and a half...
>
> --Jason
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180220/090200c8/attachment.html>