[Intel-gfx] [RFC][PATCH] drm/i915: Try to avoid most cache flushes on LLC platforms (WIP)

Mon Aug 5 23:25:30 CEST 2013

On Mon, Aug 05, 2013 at 11:13:24PM +0300, ville.syrjala at linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala at linux.intel.com>
> 
> A bit more food for the cache discussions. My idea here is to track
> whether an object is part of any fb, and in case it is we assume it
> can be used for scanout, and thus may need some extra clflushes.
> But otherwise we'd try to avoid clflushes on LLC plarforms.
> 
> I also included some GFDT stuff in there just to think a bit about how
> it would work, but I left out the GFDT flushes themselves.
> 
> Also I didn't even boot-test this, so it may explode quite
> spectacularly.
> ---
>  drivers/gpu/drm/i915/i915_drv.h            |  5 +++
>  drivers/gpu/drm/i915/i915_gem.c            | 63 +++++++++++++++++++++++-------
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +++++-
>  drivers/gpu/drm/i915/i915_gem_gtt.c        |  6 ++-
>  drivers/gpu/drm/i915/intel_display.c       |  2 +
>  drivers/gpu/drm/i915/intel_fb.c            |  1 +
>  6 files changed, 72 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8b14e22..ec275ed 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -451,6 +451,8 @@ enum i915_cache_level {
>  	I915_CACHE_NONE = 0,
>  	I915_CACHE_LLC,
>  	I915_CACHE_LLC_MLC, /* gen6+, in docs at least! */
> +	I915_CACHE_WT, /* hsw gt3e */
> +	I915_CACHE_GFDT = 0x4, /* SNB/IVB: flag ORed w/ the cache mode */
>  };
>  
>  typedef uint32_t gen6_gtt_pte_t;
> @@ -1388,6 +1390,8 @@ struct drm_i915_gem_object {
>  
>  	/** for phy allocated objects */
>  	struct drm_i915_gem_phys_object *phys_obj;
> +
> +	atomic_t fb_count;
>  };
>  #define to_gem_object(obj) (&((struct drm_i915_gem_object *)(obj))->base)
>  
> @@ -1494,6 +1498,7 @@ struct drm_i915_file_private {
>  #define HAS_VEBOX(dev)          (INTEL_INFO(dev)->has_vebox_ring)
>  #define HAS_LLC(dev)            (INTEL_INFO(dev)->has_llc)
>  #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
> +#define HAS_GFDT(dev)		(0)
>  
>  #define HAS_HW_CONTEXTS(dev)	(INTEL_INFO(dev)->gen >= 6)
>  #define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >=6 && !IS_VALLEYVIEW(dev))
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index bd6eb64..38d3241 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -394,6 +394,20 @@ shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length,
>  	return ret ? - EFAULT : 0;
>  }
>  
> +static bool cpu_caches_coherent(struct drm_i915_gem_object *obj)
> +{
> +	return HAS_LLC(obj->base.dev) || obj->cache_level != I915_CACHE_NONE;
> +}

Yes, that much better expresses what we mean.

> +static bool
> +scanout_needs_clflush(struct drm_i915_gem_object *obj,
> +		      enum i915_cache_level cache_level)
> +{
> +	return cache_level != I915_CACHE_LLC &&
> +		cache_level != I915_CACHE_LLC_MLC &&
> +		atomic_read(&obj->fb_count) > 0;
> +}

(Aside, this also a good time to change LLC_MLC to L3!)

GFDT, similarly to WT, I think is purely a bit set on the GPU side. My
experience was that I could not dirty with the CPU then issue a GPU GFDT
flush without seeing cache dirt on the scanout. Actually, I think I just
found my issue in the SNB bspec... Apparently, I must not dirty the
surface before the previous flush is complete - which is solvable by using
the active render tracking. Just another little hoop to jump through for
GFDT.

I think we really want to see the llc/scanout patch first so that we can
give it a thorough test and debunking. Then WT as it is fairly trivial
and then seeing if we can finally get GFDT right.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre