[Intel-gfx] [PATCH 3/4] drm/i915: Update rules for writing through the LLC with the cpu

Ville Syrjälä ville.syrjala at linux.intel.com
Tue Aug 6 21:08:38 CEST 2013


On Tue, Aug 06, 2013 at 07:45:40PM +0100, Chris Wilson wrote:
> On Tue, Aug 06, 2013 at 09:03:01PM +0300, Ville Syrjälä wrote:
> > On Tue, Aug 06, 2013 at 05:16:14PM +0100, Chris Wilson wrote:
> > > On Tue, Aug 06, 2013 at 05:24:25PM +0300, Ville Syrjälä wrote:
> > > > On Tue, Aug 06, 2013 at 01:17:04PM +0100, Chris Wilson wrote:
> > > > > +static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> > > > > +{
> > > > > +	if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
> > > > > +		return true;
> > > > > +
> > > > > +	return atomic_read(&obj->fb_count) > 0;
> > > > > +}
> > > > 
> > > > I was thinking a bit about this fb_count thing. The other option would
> > > > be to track actual scanout activity, which wouldn't be hard to do, but
> > > > I guess that could shift a bunch of the clflush cost to page flip time.
> > > > So maybe we don't want that?
> > > 
> > > Different series, see the frontbuffer write tracking. As for page flip,
> > > that is currently the only time we clflush...
> > >  
> > > > Also i915_gem_sw_finish_ioctl() still looks at pin_count as an
> > > > indication whether scanout is happening. We should add an fb_count
> > > > check there as well. Maybe we should leave the pin_count check there,
> > > > so that framebuffers that aren't being scanned out currently can be
> > > > massaged a bit more freely before a clflush needs to be issued?
> > > 
> > > It does do a fb_count check, the pin_count + fb_count is a really strong
> > > indicator of active scanout.
> > 
> > The fb_count checks is done only through the eventual check in
> > i915_gem_clflush_object() which is why I didn't think of it.
> > 
> > On non-LLC platforms it also ends up doing the flush for all
> > non-snooped non-scanout buffers, which would get optimized away if the
> > fb_count check was where the pin_count check is.
> 
> But it does still do the fb_count check before the flush? What am I
> missing here?

I believe the code now ends up doing this (on non-LLC):
 if (pin_count && (cache_level == UC || fb_count > 0) && write_domain == CPU)
   clflush();

So it'll flush also all pinned non-snooped buffers whether they be
scanout or not. We could delay such flushes until the bo gets
moved away from the CPU write domain. But maybe it's just a nonsense
scenario. Why would someone write to a pinned bo unless the pinning is
due to scanout?

> > > > > -	    obj->tiling_mode == I915_TILING_NONE &&
> > > > > -	    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> > > > > +	if (obj->tiling_mode == I915_TILING_NONE &&
> > > > > +	    !cpu_write_needs_clflush(obj)) {
> > > > 
> > > > This would break non-LLC platforms, I think. Before, GTT writes were
> > > > used for non-snooped buffers w/ clean CPU caches (write domain != cpu),
> > > > which makes sense since access via GTT doesn't snoop ever according
> > > > to the docs. Now I think it'll do GTT writes for snooped non-scanout
> > > > buffers.
> > > 
> > > D'oh. What I was aiming for was to always use shmem writes on snooped
> > > bo, irrespective of whether it is GPU hot.
> > > 
> > > s/!cpu_write_needs_clflush/cpu_write_needs_clflush/
> > 
> > So that means a GTT write for anything non-snooped or scanout. If it's
> > snooped scanout (I guess we may not really care about such a weird
> > combination), or non-snooped but in CPU write domain, that would
> > still have issues with possibly dirty data in the CPU caches.
> 
> Hmm, that should be fixed up by i915_gem_gtt_pwrite_fast() as we flush
> all the caches before we start overwriting. I've thoroughly confused
> myself over which paths do the fancy tricks. :(
>  
> > Might be intersting to try on real hardware what the GTT access
> > snooping behaviour really is, especially for some modern non-LLC
> > system like VLV.
> 
> I remember why I have the !DOMAIN_CPU test now. If we have already paid
> the price to pull it back into the CPU domain, let it rest there until
> it is used again.
> 
> So as it stands, I think we want:
> 
> if (obj->tiling_mode == I915_TILING_NONE &&
>     obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
>     cpu_write_needs_clflush(obj))
> 	i915_gem_gtt_pwrite_fast();

Yeah that looks better to me at least.

-- 
Ville Syrjälä
Intel OTC



More information about the Intel-gfx mailing list