[Intel-gfx] [PATCH 1/2] drm/i915: Use Write-Through cacheing for the display plane on Iris

Fri Aug 2 22:03:34 CEST 2013

On Fri, Aug 02, 2013 at 05:51:49PM +0300, Ville Syrjälä wrote:
> On Thu, Aug 01, 2013 at 06:39:54PM +0100, Chris Wilson wrote:
> > Haswell GT3e has the unique feature of supporting Write-Through cacheing
> > of objects within the eLLC/LLC. The purpose of this is to enable the display
> > plane to remain coherent whilst objects lie resident in the eLLC/LLC - so
> > that we, in theory, get the best of both worlds, perfect display and fast
> > access.
> > 
> > However, we still need to be careful as the CPU does not see the WT when
> > accessing the cache. In particular, this means that we need to flush the
> > cache lines after writing to an object through the CPU, and on
> > transitioning from a cached state to WT.
> > 
> > v2: Actually do the clflush on transition to WT, nagging by Ville.
> > v3: Flush the CPU cache after writes into WT objects.
> 
> v4?
v3.2: ditto

> >  {
> >  	if (obj->tiling_mode)
> > @@ -510,7 +520,7 @@ i915_gem_shmem_pread(struct drm_device *dev,
> >  		 * read domain and manually flush cachelines (if required). This
> >  		 * optimizes for the case when the gpu will dirty the data
> >  		 * anyway again before the next pread happens. */
> > -		if (obj->cache_level == I915_CACHE_NONE)
> > +		if (!is_cpu_read_cached(obj->cache_level))
> 
> Based on what we discussed on irc about GPU snooping even on UC
> accesses, couldn't this even be:
> 
> !HAS_LLC(dev) && obj->cache_level == I915_CACHE_NONE

Yes, I came to that conclusion later on as well. I like having the
is_cached() functions to try and clarify the reason for the tests, but I
don't think I have the names right yet.

> BTW I was toying around a bit w/ gem_cacheing. I modified it to map the
> scratch bo just once and use DRM_I915_GEM_WAIT to wait for the GPU so
> that I could avoid the set_domain stuff. Then I ran it w/ both cached
> and uncached bo, and it passes on IVB in both cases. On ILK it fails
> in the uncached bo case, but passes in the cached bo case. So that
> result appears to support the notion that UC accesses snoop on LLC
> platforms. But of course I didn't try other use cases than the BLT
> vs. CPU that gem_cacheing tests.

[snipping similar comments]

> > -	if (obj->cache_level == I915_CACHE_NONE &&
> > +	if (!is_cpu_write_cached(obj->cache_level) &&
> >  	    obj->tiling_mode == I915_TILING_NONE &&
> >  	    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> 
> Might we want to skip the GTT path on LLC platforms completely?

If we do need to clflush (e.g. uncached buffer for UC/WT scanouts), then
our experiments (last conducted on SNB iirc) suggest that we prefer to
use the GTT wc writes. But for real world cases of pwrite, we use shmem
on llc, and normally gtt with !llc.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre