[Mesa-dev] [PATCH 1/3] radeonsi: implement mechanism for IBs without partial flushes at the end (v6)

Mon Apr 16 14:50:04 UTC 2018

On Mon, Apr 16, 2018, 4:52 AM Christian König <
ckoenig.leichtzumerken at gmail.com> wrote:

> Am 15.04.2018 um 20:46 schrieb Nicolai Hähnle:
> > On 07.04.2018 04:31, Marek Olšák wrote:
> >> From: Marek Olšák <marek.olsak at amd.com>
> >>
> >> (This patch doesn't enable the behavior. It will be enabled in a later
> >> commit.)
> >>
> >> Draw calls from multiple IBs can be executed in parallel.
> >>
> >> v2: do emit partial flushes on SI
> >> v3: invalidate all shader caches at the beginning of IBs
> >> v4: don't call si_emit_cache_flush in si_flush_gfx_cs if not needed,
> >>      only do this for flushes invoked internally
> >> v5: empty IBs should wait for idle if the flush requires it
> >> v6: split the commit
> >>
> >> If we artificially limit the number of draw calls per IB to 5, we'll get
> >> a lot more IBs, leading to a lot more partial flushes. Let's see how
> >> the removal of partial flushes changes GPU utilization in that scenario:
> >>
> >> With partial flushes (time busy):
> >>      CP: 99%
> >>      SPI: 86%
> >>      CB: 73:
> >>
> >> Without partial flushes (time busy):
> >>      CP: 99%
> >>      SPI: 93%
> >>      CB: 81%
> >> ---
> >>   src/gallium/drivers/radeon/radeon_winsys.h |  7 ++++
> >>   src/gallium/drivers/radeonsi/si_gfx_cs.c   | 52
> >> ++++++++++++++++++++++--------
> >>   src/gallium/drivers/radeonsi/si_pipe.h     |  1 +
> >>   3 files changed, 46 insertions(+), 14 deletions(-)
> >> [snip]
> >> +    /* Always invalidate caches at the beginning of IBs, because
> >> external
> >> +     * users (e.g. BO evictions and SDMA/UVD/VCE IBs) can modify our
> >> +     * buffers.
> >> +     *
> >> +     * Note that the cache flush done by the kernel at the end of
> >> GFX IBs
> >> +     * isn't useful here, because that flush can finish after the
> >> following
> >> +     * IB starts drawing.
> >> +     *
> >> +     * TODO: Do we also need to invalidate CB & DB caches?
> >
> > I don't think so.
> >
> > Kernel buffer move: CB & DB caches use logical addressing, so should
> > be unaffected.
>
> Are you sure about that? Basically we don't do any extra invalidation
> when BOs are moved by the kernel.
>
> But on the other hand the worst that could happen when we skip
> invalidation is that we don't read the same data into the caches which
> is already in the caches. E.g. the content of the BO doesn't change,
> just it's location.
>

When sdma is moving a buffer, that buffer is not being used by the gfx
queue. Caches are guaranteed to be invalidated after the last use of the
buffer, i.e. before sdma starts. I don't see a way for caches to be warm
when sdma completes.

Marek

> In other words it depends how the CB&DB caches work.
>
> Christian.
>
> >
> > UVD: APIs should forbid writing to the currently bound framebuffer.
> >
> > CPU: Shouldn't be writing directly to the framebuffer, and even if it
> > does (linear framebuffer?), I believe OpenGL requires re-binding the
> > framebuffer.
> >
> > Cheers,
> > Nicolai
> >
> >
> >> +     */
> >> +    ctx->flags |= SI_CONTEXT_INV_ICACHE |
> >> +              SI_CONTEXT_INV_SMEM_L1 |
> >> +              SI_CONTEXT_INV_VMEM_L1 |
> >> +              SI_CONTEXT_INV_GLOBAL_L2 |
> >> +              SI_CONTEXT_START_PIPELINE_STATS;
> >>         /* set all valid group as dirty so they get reemited on
> >>        * next draw command
> >>        */
> >>       si_pm4_reset_emitted(ctx);
> >>         /* The CS initialization should be emitted before everything
> >> else. */
> >>       si_pm4_emit(ctx, ctx->init_config);
> >>       if (ctx->init_config_gs_rings)
> >>           si_pm4_emit(ctx, ctx->init_config_gs_rings);
> >> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h
> >> b/src/gallium/drivers/radeonsi/si_pipe.h
> >> index 0c90a6c6e46..f0f323ff3a7 100644
> >> --- a/src/gallium/drivers/radeonsi/si_pipe.h
> >> +++ b/src/gallium/drivers/radeonsi/si_pipe.h
> >> @@ -540,20 +540,21 @@ struct si_context {
> >>       void                *vs_blit_texcoord;
> >>       struct si_screen        *screen;
> >>       struct pipe_debug_callback    debug;
> >>       LLVMTargetMachineRef        tm; /* only non-threaded
> >> compilation */
> >>       struct si_shader_ctx_state    fixed_func_tcs_shader;
> >>       struct r600_resource        *wait_mem_scratch;
> >>       unsigned            wait_mem_number;
> >>       uint16_t            prefetch_L2_mask;
> >>         bool                gfx_flush_in_progress:1;
> >> +    bool                gfx_last_ib_is_busy:1;
> >>       bool                compute_is_busy:1;
> >>         unsigned            num_gfx_cs_flushes;
> >>       unsigned            initial_gfx_cs_size;
> >>       unsigned            gpu_reset_counter;
> >>       unsigned            last_dirty_tex_counter;
> >>       unsigned            last_compressed_colortex_counter;
> >>       unsigned            last_num_draw_calls;
> >>       unsigned            flags; /* flush flags */
> >>       /* Current unaccounted memory usage. */
> >>
> >
> >
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180416/2ab19635/attachment-0001.html>