[Intel-gfx] [PATCH] drm/i915: Use Write-Through cacheing for the display plane on Iris

Wed Jul 31 15:36:39 CEST 2013

On Wed, Jul 31, 2013 at 04:16:40PM +0300, Ville Syrjälä wrote:
> On Tue, Jul 30, 2013 at 08:25:56PM +0100, Chris Wilson wrote:
> > +	if (cache_level == I915_CACHE_NONE ||
> > +	    cache_level == I915_CACHE_WT) {
> >  		/* If we're coming from LLC cached, then we haven't
> >  		 * actually been tracking whether the data is in the
> >  		 * CPU cache or not, since we only allow one bit set
> >  		 * in obj->write_domain and have been skipping the clflushes.
> > -		 * Just set it to the CPU cache for now.
> > +		 * Do them now.
> >  		 */
> > -		WARN_ON(obj->base.write_domain & ~I915_GEM_DOMAIN_CPU);
> > -		WARN_ON(obj->base.read_domains & ~I915_GEM_DOMAIN_CPU);
> > +		if (obj->pages && !obj->stolen) {
> > +			u32 old_write_domain;
> >  
> > -		old_read_domains = obj->base.read_domains;
> > -		old_write_domain = obj->base.write_domain;
> > +			trace_i915_gem_object_clflush(obj);
> > +			drm_clflush_sg(obj->pages);
> >  
> > -		obj->base.read_domains = I915_GEM_DOMAIN_CPU;
> > -		obj->base.write_domain = I915_GEM_DOMAIN_CPU;
> > +			old_write_domain = obj->base.write_domain;
> > +			obj->base.write_domain &= ~I915_GEM_DOMAIN_CPU;
> >  
> > -		trace_i915_gem_object_change_domain(obj,
> > -						    old_read_domains,
> > -						    old_write_domain);
> > +			trace_i915_gem_object_change_domain(obj,
> > +							    obj->base.read_domains,
> > +							    old_write_domain);
> > +		}
> 
> With the change to i915_gem_clflush_object() this hunk could be dropped,
> no?

I liked the flush being explicit, so kept it in. It can be done either
way.

> And what about pwrite?

If anybody used pwrite to a scanout buffer, they are going to be
upset... Yes, it needs to clflush afterwards if WT.

> I must admit that I'm getting confused by our caches again.
> 
> Based on what I've read I *think* UC GPU accesses will snoop/invalidate
> the LLC and IA caches, so this kind of flush shouldn't be necessary (on
> LLC platforms that is).

Nope, you have to explicitly enable the snooping in the GPU (that's the
cacheable bit on non-LLC platforms), and then it only occurs upon
invalidation/flush commands.

> Supposedly the same goes for the pre-copy flushes
> in pread/pwrite. And the post-copy flush in pwrite is needed due to the
> display engine only.
> 
> Should we maybe add a special UC cache level for display to avoid pointless
> flushes on other UC buffers LLC platforms? That way userland could set most
> buffers to UC via the ioctl and use MOCS to override them to LLC on IVB.

No. We need the LLC bit set for coherent CPU mmaps which are now in
vogue to avoid having to use the fences for detiling, and always have
been preferred for linear buffers (i.e. asynchronous VBO maps and
texture streams).

> Also while looking through BSpec I noticed a slightly worrying note.
> Apparently, on HSW at least, L3/not-LLC cacheable surfaces can
> still evict dirty lines from L3 to LLC. The IVB flow diagrams leave me to
> think IVB could behave the same way, even though it's not really spelled
> out there. This would only be an issue when using MOCS since you can't
> configure such a caching mode through the PTEs alone.

Afaict, the render write flush is sufficient to write the dirty cache
lines to LLC/UC memory, so from the kernel/CPU perspective it never has
to worry about L3.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre