[Intel-gfx] [PATCH 2/4] drm/i915: Update rules for reading cache lines through the LLC
Daniel Vetter
daniel at ffwll.ch
Tue Aug 6 15:31:02 CEST 2013
On Tue, Aug 06, 2013 at 01:17:03PM +0100, Chris Wilson wrote:
> The LLC is a fun device. The cache is a distinct functional block within
> the SA that arbitrates access from both the CPU and GPU cores. As such
> all writes to memory land first in the LLC before further action is
> taken. For example, an uncached write from either the CPU or GPU will
> then proceed to memory and evict the cacheline from the LLC. This means that
> a read from the LLC always returns the correct information even if the PTE
> bit in the GPU differs from the PAT bit in the CPU. For the older
> snooping architecture on non-LLC, the fundamental principle still holds
> except that some coordination is required between the CPU and GPU to
> explicitly perform the snooping (which is handled by our request
> tracking).
>
> The upshot of this is that we know that we can issue a read from either
> LLC devices or snoopable memory and trust the contents of the cache -
> i.e. we can forgo a clflush before a read in these circumstances.
> Writing to memory from the CPU is a little more tricky as we have to
> consider that the scanout does not read from the CPU cache at all, but
> from main memory. So we have to currently treat all requests to write to
> uncached memory as having to be flushed to main memory for coherency
> with all consumers.
Since this is a behavioural change wrt cache coherency can we please have
an igt testcase to exercise pwrite/pread coherency on uncached buffers on
LLC platforms? With the set_caching ioctl it should be fairly easy to add
another subtest to the relevant existing igts.
/me is simply too paranoid about this stuff
Cheers, Daniel
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 22 ++++++++++++++--------
> 1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index eec6fcb..5671dab 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -60,6 +60,12 @@ static long i915_gem_purge(struct drm_i915_private *dev_priv, long target);
> static void i915_gem_shrink_all(struct drm_i915_private *dev_priv);
> static void i915_gem_object_truncate(struct drm_i915_gem_object *obj);
>
> +static bool cpu_cache_is_coherent(struct drm_device *dev,
> + enum i915_cache_level level)
> +{
> + return HAS_LLC(dev) || level != I915_CACHE_NONE;
> +}
> +
> static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
> {
> if (obj->tiling_mode)
> @@ -510,8 +516,7 @@ i915_gem_shmem_pread(struct drm_device *dev,
> * read domain and manually flush cachelines (if required). This
> * optimizes for the case when the gpu will dirty the data
> * anyway again before the next pread happens. */
> - if (obj->cache_level == I915_CACHE_NONE)
> - needs_clflush = 1;
> + needs_clflush = !cpu_cache_is_coherent(dev, obj->cache_level);
> if (i915_gem_obj_ggtt_bound(obj)) {
> ret = i915_gem_object_set_to_gtt_domain(obj, false);
> if (ret)
> @@ -835,11 +840,11 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
> return ret;
> }
> }
> - /* Same trick applies for invalidate partially written cachelines before
> - * writing. */
> - if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)
> - && obj->cache_level == I915_CACHE_NONE)
> - needs_clflush_before = 1;
> + /* Same trick applies to invalidate partially written cachelines read
> + * before writing. */
> + if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0)
> + needs_clflush_before =
> + !cpu_cache_is_coherent(dev, obj->cache_level);
>
> ret = i915_gem_object_get_pages(obj);
> if (ret)
> @@ -3650,7 +3655,8 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>
> /* Flush the CPU cache if it's still invalid. */
> if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> - i915_gem_clflush_object(obj);
> + if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
> + i915_gem_clflush_object(obj);
>
> obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
> }
> --
> 1.8.4.rc1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the Intel-gfx
mailing list