[Intel-gfx] [PATCH] drm/i915: Reduce engine->emit_flush() to a single mode parameter

Dave Gordon david.s.gordon at intel.com
Thu Jul 28 14:57:11 UTC 2016


On 27/07/16 11:53, Chris Wilson wrote:
> Rather than passing a complete set of GPU cache domains for either
> invalidation or for flushing, or even both, just pass a single parameter
> to the engine->emit_flush to determine the required operations.
>
> engine->emit_flush(GPU, 0) -> engine->emit_flush(EMIT_INVALIDATE)
> engine->emit_flush(0, GPU) -> engine->emit_flush(EMIT_FLUSH)
> engine->emit_flush(GPU, GPU) -> engine->emit_flush(EMIT_FLUSH | EMIT_INVALIDATE)
>
> This allows us to extend the behaviour easily in future, for example if
> we want just a command barrier without the overhead of flushing.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Dave Gordon <david.s.gordon at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c    |  2 +-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c        | 10 ++----
>  drivers/gpu/drm/i915/i915_gem_request.c    |  2 +-
>  drivers/gpu/drm/i915/intel_lrc.c           | 23 +++++-------
>  drivers/gpu/drm/i915/intel_ringbuffer.c    | 57 +++++++++++-------------------
>  drivers/gpu/drm/i915/intel_ringbuffer.h    |  6 ++--
>  7 files changed, 38 insertions(+), 64 deletions(-)

[snip]

> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 39fa9eb10514..671b1cab5e54 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1666,8 +1666,7 @@ static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
>  	int ret;
>
>  	/* NB: TLBs must be flushed and invalidated before a switch */
> -	ret = engine->emit_flush(req,
> -				 I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
> +	ret = engine->emit_flush(req, EMIT_INVALIDATE | EMIT_FLUSH);

The example in the commit message says "EMIT_FLUSH | EMIT_INVALIDATE"
which I think looks much nicer. Flush *after* invalidate would be pretty 
meaningless!

For even more syntactic sugar, we could choose "FLUSH + INVALIDATE" as + 
and | are equivalent for disjoint bitfields.

>  int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request)
> @@ -998,9 +998,7 @@ static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
>  	if (w->count == 0)
>  		return 0;
>
> -	ret = req->engine->emit_flush(req,
> -				      I915_GEM_GPU_DOMAINS,
> -				      I915_GEM_GPU_DOMAINS);
> +	ret = req->engine->emit_flush(req, EMIT_BARRIER);
>  	if (ret)
>  		return ret;

Distinguishing flush-and-invalidate from BARRIER seems like a good idea, 
because here we really don't want to flush or invalidate the GPU's 
caches; we really only want some sort of synchronisation of memory 
activity before changing the register. Therefore ...

[snip]

> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 00723401f98c..76d0495943c3 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -292,8 +292,10 @@ struct intel_engine_cs {
>  	u32 ctx_desc_template;
>  	int		(*emit_request)(struct drm_i915_gem_request *request);
>  	int		(*emit_flush)(struct drm_i915_gem_request *request,
> -				      u32 invalidate_domains,
> -				      u32 flush_domains);
> +				      u32 mode);
> +#define EMIT_INVALIDATE	BIT(0)
> +#define EMIT_FLUSH	BIT(1)
> +#define EMIT_BARRIER	(EMIT_INVALIDATE | EMIT_FLUSH)

... I think these should be three (or maybe four) distinct operations, 
and collapsing BARRIER to FLUSH+INVALIDATE if the h/w can't do a simple 
barrier should be left to the gen-specific code. Also what about 
allowing for a FLUSH or INVALIDATE without BARRIER semantics? Do we care 
about whether the command streamer stalls before/after these operations?

So maybe start with

#define	EMIT_CS_STALL	BIT(0)	/* Stall CS before operation	*/
#define	EMIT_WRITEBACK	BIT(1)	/* Ensure caches written back	*/
#define	EMIT_DROP	BIT(2)	/* Drop contents of cache(s)	*/
#define	EMIT_POSTSYNC	BIT(3)	/* Wait until complete		*/

and then

#define	EMIT_BARRIER	(EMIT_CS_STALL+EMIT_POSTSYNC)
#define	EMIT_FLUSH	(EMIT_CS_STALL+EMIT_WRITEBACK)
#define	EMIT_INVALIDATE	(EMIT_DROP+EMIT_POSTSYNC)
#define	EMIT_SWITCH	(EMIT_FLUSH+EMIT_INVALIDATE)

or whatever combinations of basic operations are actually useful.

.Dave.


More information about the Intel-gfx mailing list