[Mesa-dev] [PATCH 1/3] radeonsi: implement mechanism for IBs without partial flushes at the end (v6)

Sun Apr 15 18:46:21 UTC 2018

On 07.04.2018 04:31, Marek Olšák wrote:
> From: Marek Olšák <marek.olsak at amd.com>
> 
> (This patch doesn't enable the behavior. It will be enabled in a later
> commit.)
> 
> Draw calls from multiple IBs can be executed in parallel.
> 
> v2: do emit partial flushes on SI
> v3: invalidate all shader caches at the beginning of IBs
> v4: don't call si_emit_cache_flush in si_flush_gfx_cs if not needed,
>      only do this for flushes invoked internally
> v5: empty IBs should wait for idle if the flush requires it
> v6: split the commit
> 
> If we artificially limit the number of draw calls per IB to 5, we'll get
> a lot more IBs, leading to a lot more partial flushes. Let's see how
> the removal of partial flushes changes GPU utilization in that scenario:
> 
> With partial flushes (time busy):
>      CP: 99%
>      SPI: 86%
>      CB: 73:
> 
> Without partial flushes (time busy):
>      CP: 99%
>      SPI: 93%
>      CB: 81%
> ---
>   src/gallium/drivers/radeon/radeon_winsys.h |  7 ++++
>   src/gallium/drivers/radeonsi/si_gfx_cs.c   | 52 ++++++++++++++++++++++--------
>   src/gallium/drivers/radeonsi/si_pipe.h     |  1 +
>   3 files changed, 46 insertions(+), 14 deletions(-)
> [snip]
> +	/* Always invalidate caches at the beginning of IBs, because external
> +	 * users (e.g. BO evictions and SDMA/UVD/VCE IBs) can modify our
> +	 * buffers.
> +	 *
> +	 * Note that the cache flush done by the kernel at the end of GFX IBs
> +	 * isn't useful here, because that flush can finish after the following
> +	 * IB starts drawing.
> +	 *
> +	 * TODO: Do we also need to invalidate CB & DB caches?

I don't think so.

Kernel buffer move: CB & DB caches use logical addressing, so should be 
unaffected.

UVD: APIs should forbid writing to the currently bound framebuffer.

CPU: Shouldn't be writing directly to the framebuffer, and even if it 
does (linear framebuffer?), I believe OpenGL requires re-binding the 
framebuffer.

Cheers,
Nicolai

> +	 */
> +	ctx->flags |= SI_CONTEXT_INV_ICACHE |
> +		      SI_CONTEXT_INV_SMEM_L1 |
> +		      SI_CONTEXT_INV_VMEM_L1 |
> +		      SI_CONTEXT_INV_GLOBAL_L2 |
> +		      SI_CONTEXT_START_PIPELINE_STATS;
>   
>   	/* set all valid group as dirty so they get reemited on
>   	 * next draw command
>   	 */
>   	si_pm4_reset_emitted(ctx);
>   
>   	/* The CS initialization should be emitted before everything else. */
>   	si_pm4_emit(ctx, ctx->init_config);
>   	if (ctx->init_config_gs_rings)
>   		si_pm4_emit(ctx, ctx->init_config_gs_rings);
> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h b/src/gallium/drivers/radeonsi/si_pipe.h
> index 0c90a6c6e46..f0f323ff3a7 100644
> --- a/src/gallium/drivers/radeonsi/si_pipe.h
> +++ b/src/gallium/drivers/radeonsi/si_pipe.h
> @@ -540,20 +540,21 @@ struct si_context {
>   	void				*vs_blit_texcoord;
>   	struct si_screen		*screen;
>   	struct pipe_debug_callback	debug;
>   	LLVMTargetMachineRef		tm; /* only non-threaded compilation */
>   	struct si_shader_ctx_state	fixed_func_tcs_shader;
>   	struct r600_resource		*wait_mem_scratch;
>   	unsigned			wait_mem_number;
>   	uint16_t			prefetch_L2_mask;
>   
>   	bool				gfx_flush_in_progress:1;
> +	bool				gfx_last_ib_is_busy:1;
>   	bool				compute_is_busy:1;
>   
>   	unsigned			num_gfx_cs_flushes;
>   	unsigned			initial_gfx_cs_size;
>   	unsigned			gpu_reset_counter;
>   	unsigned			last_dirty_tex_counter;
>   	unsigned			last_compressed_colortex_counter;
>   	unsigned			last_num_draw_calls;
>   	unsigned			flags; /* flush flags */
>   	/* Current unaccounted memory usage. */
> 

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.