[Intel-gfx] [PATCH v4] drm/i915 : Avoid superfluous invalidation of CPU cache lines

akash.goel at intel.com akash.goel at intel.com
Wed Dec 2 00:07:44 PST 2015

From: Akash Goel <akash.goel at intel.com>

When the object is moved out of CPU read domain, the cachelines
are not invalidated immediately. The invalidation is deferred till
next time the object is brought back into CPU read domain.
But the invalidation is done unconditionally, i.e. even for the case
where the cachelines were flushed previously, when the object moved out
of CPU write domain. This is avoidable and would lead to some optimization.
Though this is not a hypothetical case, but is unlikely to occur often.
The aim is to detect changes to the backing storage whilst the
data is potentially in the CPU cache, and only clflush in those case.

v2: Made the comment more verbose (Ville/Chris)
    Added doc for 'cache_clean' field (Daniel)

v3: Updated the comment to assuage an apprehension regarding the
    speculative-prefetching behavior of HW (Ville/Chris)

v4: Renamed 'cache_clean' to 'cache_flushed' as its more appropriate (Ville)
    Made minor update in the comments for more clarity (Chris)

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_set_domain
Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel at intel.com>
 drivers/gpu/drm/i915/i915_drv.h |  9 +++++++++
 drivers/gpu/drm/i915/i915_gem.c | 19 ++++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 11ae5a5..e6e4bb0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2100,6 +2100,15 @@ struct drm_i915_gem_object {
 	unsigned int cache_level:3;
 	unsigned int cache_dirty:1;
+	/*
+	 * Tracks if the CPU cache has been completely flushed, on which
+	 * there should be no data in CPU cachelines for the object.
+	 * cache_flushed would also imply !cache_dirty (no data in
+	 * cachelines, so not dirty also).
+	 * cache_dirty just tracks whether we have been omitting clflushes.
+	 */
+	unsigned int cache_flushed:1;
 	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
 	unsigned int pin_display;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 33adc8f..cdc50d8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 	obj->cache_dirty = false;
+	obj->cache_flushed = true;
 	return true;
@@ -3982,7 +3983,23 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 	/* Flush the CPU cache if it's still invalid. */
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
-		i915_gem_clflush_object(obj, false);
+		/* When the object was moved out of the CPU domain following a
+		 * CPU write, we will have flushed it out of the CPU cache (and
+		 * marked the object as cache_flushed).
+		 * After the clflush we know that this object cannot be in the
+		 * CPU cache, nor can it be speculatively loaded into the CPU
+		 * cache as our objects are page-aligned and speculation cannot
+		 * cross page boundaries. So whilst the cache_flushed flag is
+		 * set, we know that any future access to the object's pages
+		 * will miss the CPU cache and have to be serviced from main
+		 * memory (where they will pick up any writes through the GTT or
+		 * by the GPU) i.e. we do not need another clflush here and now
+		 * to invalidate the CPU cache as we prepare to read from the
+		 * object.
+		 */
+		if (!obj->cache_flushed)
+			i915_gem_clflush_object(obj, false);
+		obj->cache_flushed = false;
 		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;

More information about the Intel-gfx mailing list