[Intel-gfx] [PATCH 3/4] drm/i915: Update rules for writing through the LLC with the cpu

Tue Aug 6 18:16:14 CEST 2013

On Tue, Aug 06, 2013 at 05:24:25PM +0300, Ville Syrjälä wrote:
> On Tue, Aug 06, 2013 at 01:17:04PM +0100, Chris Wilson wrote:
> > +static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> > +{
> > +	if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
> > +		return true;
> > +
> > +	return atomic_read(&obj->fb_count) > 0;
> > +}
> 
> I was thinking a bit about this fb_count thing. The other option would
> be to track actual scanout activity, which wouldn't be hard to do, but
> I guess that could shift a bunch of the clflush cost to page flip time.
> So maybe we don't want that?

Different series, see the frontbuffer write tracking. As for page flip,
that is currently the only time we clflush...

> Also i915_gem_sw_finish_ioctl() still looks at pin_count as an
> indication whether scanout is happening. We should add an fb_count
> check there as well. Maybe we should leave the pin_count check there,
> so that framebuffers that aren't being scanned out currently can be
> massaged a bit more freely before a clflush needs to be issued?

It does do a fb_count check, the pin_count + fb_count is a really strong
indicator of active scanout.

But sw_finish is truly legacy, so it just has to work and be as simple
as possible.

> > -	if (obj->cache_level == I915_CACHE_NONE &&
> > -	    obj->tiling_mode == I915_TILING_NONE &&
> > -	    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> > +	if (obj->tiling_mode == I915_TILING_NONE &&
> > +	    !cpu_write_needs_clflush(obj)) {
> 
> This would break non-LLC platforms, I think. Before, GTT writes were
> used for non-snooped buffers w/ clean CPU caches (write domain != cpu),
> which makes sense since access via GTT doesn't snoop ever according
> to the docs. Now I think it'll do GTT writes for snooped non-scanout
> buffers.

D'oh. What I was aiming for was to always use shmem writes on snooped
bo, irrespective of whether it is GPU hot.

s/!cpu_write_needs_clflush/cpu_write_needs_clflush/

> 
> >  		ret = i915_gem_gtt_pwrite_fast(dev, obj, args, file);
> >  		/* Note that the gtt paths might fail with non-page-backed user
> >  		 * pointers (e.g. gtt mappings when moving data between
> > @@ -3299,7 +3305,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj)
> >  	 * snooping behaviour occurs naturally as the result of our domain
> >  	 * tracking.
> >  	 */
> > -	if (obj->cache_level != I915_CACHE_NONE)
> > +	if (!cpu_write_needs_clflush(obj))
> >  		return;
> 
> I would actually prefer to kill this check completely, and move the
> decision whether to do a clflush to the callers.

I saw, I differ in opinion.

> There are three places that I spotted which lack such a check;
> i915_gem_execbuffer_move_to_gpu() and i915_gem_object_set_to_cpu_domain()
> need to flush only when !cpu_cache_is_coherent(), whereas
> i915_gem_object_flush_cpu_write_domain() needs to flush for scanout as
> well.

I like the approach of trying to keep as much of the complexity inside
i915_gem.c and exporting simple rules to the callers.

> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > index 0584480..cc5eaba 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -9443,6 +9443,8 @@ static void intel_user_framebuffer_destroy(struct drm_framebuffer *fb)
> >  	struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb);
> >  
> >  	drm_framebuffer_cleanup(fb);
> > +
> > +	atomic_dec(&intel_fb->obj->fb_count);
> 
> Where did the the other atomic_dec() go (was in intel_fbdev_destroy())?

Missed. I blame the lack of intel_framebuffer_fini(), that would be a
better patch first.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre