[Intel-gfx] [PATCH] drm/i915: Reduce engine->emit_flush() to a single mode parameter
Dave Gordon
david.s.gordon at intel.com
Thu Jul 28 14:57:11 UTC 2016
On 27/07/16 11:53, Chris Wilson wrote:
> Rather than passing a complete set of GPU cache domains for either
> invalidation or for flushing, or even both, just pass a single parameter
> to the engine->emit_flush to determine the required operations.
>
> engine->emit_flush(GPU, 0) -> engine->emit_flush(EMIT_INVALIDATE)
> engine->emit_flush(0, GPU) -> engine->emit_flush(EMIT_FLUSH)
> engine->emit_flush(GPU, GPU) -> engine->emit_flush(EMIT_FLUSH | EMIT_INVALIDATE)
>
> This allows us to extend the behaviour easily in future, for example if
> we want just a command barrier without the overhead of flushing.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Dave Gordon <david.s.gordon at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_context.c | 2 +-
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 +-
> drivers/gpu/drm/i915/i915_gem_gtt.c | 10 ++----
> drivers/gpu/drm/i915/i915_gem_request.c | 2 +-
> drivers/gpu/drm/i915/intel_lrc.c | 23 +++++-------
> drivers/gpu/drm/i915/intel_ringbuffer.c | 57 +++++++++++-------------------
> drivers/gpu/drm/i915/intel_ringbuffer.h | 6 ++--
> 7 files changed, 38 insertions(+), 64 deletions(-)
[snip]
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 39fa9eb10514..671b1cab5e54 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1666,8 +1666,7 @@ static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
> int ret;
>
> /* NB: TLBs must be flushed and invalidated before a switch */
> - ret = engine->emit_flush(req,
> - I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
> + ret = engine->emit_flush(req, EMIT_INVALIDATE | EMIT_FLUSH);
The example in the commit message says "EMIT_FLUSH | EMIT_INVALIDATE"
which I think looks much nicer. Flush *after* invalidate would be pretty
meaningless!
For even more syntactic sugar, we could choose "FLUSH + INVALIDATE" as +
and | are equivalent for disjoint bitfields.
> int intel_logical_ring_alloc_request_extras(struct drm_i915_gem_request *request)
> @@ -998,9 +998,7 @@ static int intel_logical_ring_workarounds_emit(struct drm_i915_gem_request *req)
> if (w->count == 0)
> return 0;
>
> - ret = req->engine->emit_flush(req,
> - I915_GEM_GPU_DOMAINS,
> - I915_GEM_GPU_DOMAINS);
> + ret = req->engine->emit_flush(req, EMIT_BARRIER);
> if (ret)
> return ret;
Distinguishing flush-and-invalidate from BARRIER seems like a good idea,
because here we really don't want to flush or invalidate the GPU's
caches; we really only want some sort of synchronisation of memory
activity before changing the register. Therefore ...
[snip]
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 00723401f98c..76d0495943c3 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -292,8 +292,10 @@ struct intel_engine_cs {
> u32 ctx_desc_template;
> int (*emit_request)(struct drm_i915_gem_request *request);
> int (*emit_flush)(struct drm_i915_gem_request *request,
> - u32 invalidate_domains,
> - u32 flush_domains);
> + u32 mode);
> +#define EMIT_INVALIDATE BIT(0)
> +#define EMIT_FLUSH BIT(1)
> +#define EMIT_BARRIER (EMIT_INVALIDATE | EMIT_FLUSH)
... I think these should be three (or maybe four) distinct operations,
and collapsing BARRIER to FLUSH+INVALIDATE if the h/w can't do a simple
barrier should be left to the gen-specific code. Also what about
allowing for a FLUSH or INVALIDATE without BARRIER semantics? Do we care
about whether the command streamer stalls before/after these operations?
So maybe start with
#define EMIT_CS_STALL BIT(0) /* Stall CS before operation */
#define EMIT_WRITEBACK BIT(1) /* Ensure caches written back */
#define EMIT_DROP BIT(2) /* Drop contents of cache(s) */
#define EMIT_POSTSYNC BIT(3) /* Wait until complete */
and then
#define EMIT_BARRIER (EMIT_CS_STALL+EMIT_POSTSYNC)
#define EMIT_FLUSH (EMIT_CS_STALL+EMIT_WRITEBACK)
#define EMIT_INVALIDATE (EMIT_DROP+EMIT_POSTSYNC)
#define EMIT_SWITCH (EMIT_FLUSH+EMIT_INVALIDATE)
or whatever combinations of basic operations are actually useful.
.Dave.
More information about the Intel-gfx
mailing list