<div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr">On Mon, Apr 16, 2018, 4:52 AM Christian König <<a href="mailto:ckoenig.leichtzumerken@gmail.com">ckoenig.leichtzumerken@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Am 15.04.2018 um 20:46 schrieb Nicolai Hähnle:<br>
> On 07.04.2018 04:31, Marek Olšák wrote:<br>
>> From: Marek Olšák <<a href="mailto:marek.olsak@amd.com" target="_blank" rel="noreferrer">marek.olsak@amd.com</a>><br>
>><br>
>> (This patch doesn't enable the behavior. It will be enabled in a later<br>
>> commit.)<br>
>><br>
>> Draw calls from multiple IBs can be executed in parallel.<br>
>><br>
>> v2: do emit partial flushes on SI<br>
>> v3: invalidate all shader caches at the beginning of IBs<br>
>> v4: don't call si_emit_cache_flush in si_flush_gfx_cs if not needed,<br>
>> only do this for flushes invoked internally<br>
>> v5: empty IBs should wait for idle if the flush requires it<br>
>> v6: split the commit<br>
>><br>
>> If we artificially limit the number of draw calls per IB to 5, we'll get<br>
>> a lot more IBs, leading to a lot more partial flushes. Let's see how<br>
>> the removal of partial flushes changes GPU utilization in that scenario:<br>
>><br>
>> With partial flushes (time busy):<br>
>> CP: 99%<br>
>> SPI: 86%<br>
>> CB: 73:<br>
>><br>
>> Without partial flushes (time busy):<br>
>> CP: 99%<br>
>> SPI: 93%<br>
>> CB: 81%<br>
>> ---<br>
>> src/gallium/drivers/radeon/radeon_winsys.h | 7 ++++<br>
>> src/gallium/drivers/radeonsi/si_gfx_cs.c | 52 <br>
>> ++++++++++++++++++++++--------<br>
>> src/gallium/drivers/radeonsi/si_pipe.h | 1 +<br>
>> 3 files changed, 46 insertions(+), 14 deletions(-)<br>
>> [snip]<br>
>> + /* Always invalidate caches at the beginning of IBs, because <br>
>> external<br>
>> + * users (e.g. BO evictions and SDMA/UVD/VCE IBs) can modify our<br>
>> + * buffers.<br>
>> + *<br>
>> + * Note that the cache flush done by the kernel at the end of <br>
>> GFX IBs<br>
>> + * isn't useful here, because that flush can finish after the <br>
>> following<br>
>> + * IB starts drawing.<br>
>> + *<br>
>> + * TODO: Do we also need to invalidate CB & DB caches?<br>
><br>
> I don't think so.<br>
><br>
> Kernel buffer move: CB & DB caches use logical addressing, so should <br>
> be unaffected.<br>
<br>
Are you sure about that? Basically we don't do any extra invalidation <br>
when BOs are moved by the kernel.<br>
<br>
But on the other hand the worst that could happen when we skip <br>
invalidation is that we don't read the same data into the caches which <br>
is already in the caches. E.g. the content of the BO doesn't change, <br>
just it's location.<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">When sdma is moving a buffer, that buffer is not being used by the gfx queue. Caches are guaranteed to be invalidated after the last use of the buffer, i.e. before sdma starts. I don't see a way for caches to be warm when sdma completes.</div><div dir="auto"><br></div><div dir="auto">Marek</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
In other words it depends how the CB&DB caches work.<br>
<br>
Christian.<br>
<br>
><br>
> UVD: APIs should forbid writing to the currently bound framebuffer.<br>
><br>
> CPU: Shouldn't be writing directly to the framebuffer, and even if it <br>
> does (linear framebuffer?), I believe OpenGL requires re-binding the <br>
> framebuffer.<br>
><br>
> Cheers,<br>
> Nicolai<br>
><br>
><br>
>> + */<br>
>> + ctx->flags |= SI_CONTEXT_INV_ICACHE |<br>
>> + SI_CONTEXT_INV_SMEM_L1 |<br>
>> + SI_CONTEXT_INV_VMEM_L1 |<br>
>> + SI_CONTEXT_INV_GLOBAL_L2 |<br>
>> + SI_CONTEXT_START_PIPELINE_STATS;<br>
>> /* set all valid group as dirty so they get reemited on<br>
>> * next draw command<br>
>> */<br>
>> si_pm4_reset_emitted(ctx);<br>
>> /* The CS initialization should be emitted before everything <br>
>> else. */<br>
>> si_pm4_emit(ctx, ctx->init_config);<br>
>> if (ctx->init_config_gs_rings)<br>
>> si_pm4_emit(ctx, ctx->init_config_gs_rings);<br>
>> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h <br>
>> b/src/gallium/drivers/radeonsi/si_pipe.h<br>
>> index 0c90a6c6e46..f0f323ff3a7 100644<br>
>> --- a/src/gallium/drivers/radeonsi/si_pipe.h<br>
>> +++ b/src/gallium/drivers/radeonsi/si_pipe.h<br>
>> @@ -540,20 +540,21 @@ struct si_context {<br>
>> void *vs_blit_texcoord;<br>
>> struct si_screen *screen;<br>
>> struct pipe_debug_callback debug;<br>
>> LLVMTargetMachineRef tm; /* only non-threaded <br>
>> compilation */<br>
>> struct si_shader_ctx_state fixed_func_tcs_shader;<br>
>> struct r600_resource *wait_mem_scratch;<br>
>> unsigned wait_mem_number;<br>
>> uint16_t prefetch_L2_mask;<br>
>> bool gfx_flush_in_progress:1;<br>
>> + bool gfx_last_ib_is_busy:1;<br>
>> bool compute_is_busy:1;<br>
>> unsigned num_gfx_cs_flushes;<br>
>> unsigned initial_gfx_cs_size;<br>
>> unsigned gpu_reset_counter;<br>
>> unsigned last_dirty_tex_counter;<br>
>> unsigned last_compressed_colortex_counter;<br>
>> unsigned last_num_draw_calls;<br>
>> unsigned flags; /* flush flags */<br>
>> /* Current unaccounted memory usage. */<br>
>><br>
><br>
><br>
<br>
_______________________________________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank" rel="noreferrer">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
</blockquote></div></div></div>