[Intel-gfx] [PATCH v3] drm/i915 : Avoid superfluous invalidation of CPU cache lines
Ville Syrjälä
ville.syrjala at linux.intel.com
Tue Dec 1 04:34:41 PST 2015
On Mon, Nov 30, 2015 at 12:41:05PM +0530, akash.goel at intel.com wrote:
> From: Akash Goel <akash.goel at intel.com>
>
> When the object is moved out of CPU read domain, the cachelines
> are not invalidated immediately. The invalidation is deferred till
> next time the object is brought back into CPU read domain.
> But the invalidation is done unconditionally, i.e. even for the case
> where the cachelines were flushed previously, when the object moved out
> of CPU write domain. This is avoidable and would lead to some optimization.
> Though this is not a hypothetical case, but is unlikely to occur often.
> The aim is to detect changes to the backing storage whilst the
> data is potentially in the CPU cache, and only clflush in those case.
>
> v2: Made the comment more verbose (Ville/Chris)
> Added doc for 'cache_clean' field (Daniel)
>
> v3: Updated the comment to assuage an apprehension regarding the
> speculative-prefetching behavior of HW (Ville/Chris)
>
> Testcase: igt/gem_concurrent_blit
> Testcase: igt/benchmarks/gem_set_domain
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Signed-off-by: Akash Goel <akash.goel at intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 9 +++++++++
> drivers/gpu/drm/i915/i915_gem.c | 17 ++++++++++++++++-
> 2 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 11ae5a5..f97795e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2100,6 +2100,15 @@ struct drm_i915_gem_object {
> unsigned int cache_level:3;
> unsigned int cache_dirty:1;
>
> + /*
> + * Tracks if the CPU cache has been completely clflushed.
> + * !cache_clean does not imply cache_dirty (there is some data in the
> + * CPU cachelines, but has not been dirtied), but cache_clean
> + * does imply !cache_dirty (no data in cachelines, so not dirty also).
> + * Actually cache_dirty tracks whether we have been omitting clflushes.
> + */
> + unsigned int cache_clean:1;
Maybe it should be cache_flushed or something? clean really makes me
think !dirty.
> +
> unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
>
> unsigned int pin_display;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 33adc8f..7376be8 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
> trace_i915_gem_object_clflush(obj);
> drm_clflush_sg(obj->pages);
> obj->cache_dirty = false;
> + obj->cache_clean = true;
>
> return true;
> }
> @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>
> /* Flush the CPU cache if it's still invalid. */
> if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> - i915_gem_clflush_object(obj, false);
> + /* If an object is moved out of the CPU domain following a
> + * CPU write and before a GPU or GTT write, we will clflush
> + * it out of the CPU cache, and mark the cache as clean.
> + * After clflushing we know that this object cannot be in the
> + * CPU cache, nor can it be speculatively loaded into the CPU
> + * cache as our objects are page-aligned (& speculation cannot
> + * cross page boundaries). Whilst this flag is set, we know
> + * that any future access to the object's pages will miss the
> + * stale cache and have to be serviced from main memory, i.e.
> + * we do not need another clflush to invalidate the CPU cache
> + * in preparing to read from the object.
> + */
> + if (!obj->cache_clean)
> + i915_gem_clflush_object(obj, false);
> + obj->cache_clean = false;
Having the comment here talk about moving stuff out of the cpu domain
made me think there's a bug here (false vs. true). But actually this
code moves it into the cpu domain so it's actually fine, I wonder if
there's a better place for the comment (eg. where we do set
cache_clean=true)?
>
> obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
> }
> --
> 1.9.2
--
Ville Syrjälä
Intel OTC
More information about the Intel-gfx
mailing list