[Intel-gfx] [PATCH 3/4] drm/i915: Flush all user surfaces prior to first use

Francisco Jerez currojerez at riseup.net
Tue Nov 12 19:58:54 UTC 2019


Chris Wilson <chris at chris-wilson.co.uk> writes:

> Quoting Francisco Jerez (2019-07-24 21:37:24)
>> Chris Wilson <chris at chris-wilson.co.uk> writes:
>> 
>> > Since userspace has the ability to bypass the CPU cache from within its
>> > unprivileged command stream, we have to flush the CPU cache to memory
>> > in order to overwrite the previous contents on creation.
>> >
>> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>> > Cc: stablevger.kernel.org
>> > ---
>> >  drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 26 ++++++-----------------
>> >  1 file changed, 7 insertions(+), 19 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> > index d2a1158868e7..f752b326d399 100644
>> > --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>> > @@ -459,7 +459,6 @@ i915_gem_object_create_shmem(struct drm_i915_private *i915, u64 size)
>> >  {
>> >       struct drm_i915_gem_object *obj;
>> >       struct address_space *mapping;
>> > -     unsigned int cache_level;
>> >       gfp_t mask;
>> >       int ret;
>> >  
>> > @@ -498,24 +497,13 @@ i915_gem_object_create_shmem(struct drm_i915_private *i915, u64 size)
>> >       obj->write_domain = I915_GEM_DOMAIN_CPU;
>> >       obj->read_domains = I915_GEM_DOMAIN_CPU;
>> >  
>> > -     if (HAS_LLC(i915))
>> > -             /* On some devices, we can have the GPU use the LLC (the CPU
>> > -              * cache) for about a 10% performance improvement
>> > -              * compared to uncached.  Graphics requests other than
>> > -              * display scanout are coherent with the CPU in
>> > -              * accessing this cache.  This means in this mode we
>> > -              * don't need to clflush on the CPU side, and on the
>> > -              * GPU side we only need to flush internal caches to
>> > -              * get data visible to the CPU.
>> > -              *
>> > -              * However, we maintain the display planes as UC, and so
>> > -              * need to rebind when first used as such.
>> > -              */
>> > -             cache_level = I915_CACHE_LLC;
>> > -     else
>> > -             cache_level = I915_CACHE_NONE;
>> > -
>> > -     i915_gem_object_set_cache_coherency(obj, cache_level);
>> > +     /*
>> > +      * Note that userspace has control over cache-bypass
>> > +      * via its command stream, so even on LLC architectures
>> > +      * we have to flush out the CPU cache to memory to
>> > +      * clear previous contents.
>> > +      */
>> > +     i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
>> >  
>> 
>> I'm not sure if you've seen my comments about this in an internal thread
>> you were CC'ed to: I don't think this patch will have the intended
>> effect.  The various clflushes this could trigger before the first use
>> of the buffer are conditional on "obj->cache_dirty &
>> ~obj->cache_coherent", which will always be false on LLC platforms
>> AFAICT, because on those platforms i915_gem_object_set_cache_coherency()
>> will always set bit 0 of obj->cache_coherent.
>
> You only need to flush the stale written-to cachelines, so that the page
> content is correct on reuse of the foreign struct page. After that point,
> the CPU cache is managed by the client.

Point was that the code above wouldn't have necessarily led to *any*
flushing on Gen11+ platforms, not even of stale written-to cachelines,
because the various clflushes this could have triggered before the first
use of the buffer were conditional on "obj->cache_dirty &
~obj->cache_coherent", which would still have been false on LLC
platforms.

Anyway you may have fixed that in the next revision of this patch you
sent today by doing things in a completely different way.  But you
didn't answer my question asking why you are doing this on all
platforms.  Cache bypass isn't available on ICL nor earlier platforms,
is it?  And it's been defeatured on TGL:

https://www.spinics.net/lists/intel-gfx/msg219552.html

> -Chris


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20191112/38e504a0/attachment-0001.sig>


More information about the Intel-gfx mailing list