Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

14 Jul 2021

      On Wed, Jul 14, 2021 at 01:16:57PM +0200, Daniel Vetter wrote:
...
On Tue, Jul 13, 2021 at 09:46:30PM +0300, Ville Syrjälä wrote:
...
On Tue, Jul 13, 2021 at 07:24:23PM +0100, Matthew Auld wrote:
...
On Tue, 13 Jul 2021 at 18:47, Ville Syrjälä
ville.syrjala@linux.intel.com wrote:
...
On Tue, Jul 13, 2021 at 05:13:37PM +0100, Matthew Auld wrote:
...
On Tue, 13 Jul 2021 at 16:55, Ville Syrjälä
ville.syrjala@linux.intel.com wrote:
...
On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote:
> +     /**
> +      * @cache_coherent:
> +      *
> +      * Track whether the pages are coherent with the GPU if reading or
> +      * writing through the CPU cache.
> +      *
> +      * This largely depends on the @cache_level, for example if the object
> +      * is marked as I915_CACHE_LLC, then GPU access is coherent for both
> +      * reads and writes through the CPU cache.
> +      *
> +      * Note that on platforms with shared-LLC support(HAS_LLC) reads through
> +      * the CPU cache are always coherent, regardless of the @cache_level. On
> +      * snooping based platforms this is not the case, unless the full
> +      * I915_CACHE_LLC or similar setting is used.
> +      *
> +      * As a result of this we need to track coherency separately for reads
> +      * and writes, in order to avoid superfluous flushing on shared-LLC
> +      * platforms, for reads.
> +      *
> +      * I915_BO_CACHE_COHERENT_FOR_READ:
> +      *
> +      * When reading through the CPU cache, the GPU is still coherent. Note
> +      * that no data has actually been modified here, so it might seem
> +      * strange that we care about this.
> +      *
> +      * As an example, if some object is mapped on the CPU with write-back
> +      * caching, and we read some page, then the cache likely now contains
> +      * the data from that read. At this point the cache and main memory
> +      * match up, so all good. But next the GPU needs to write some data to
> +      * that same page. Now if the @cache_level is I915_CACHE_NONE and the
> +      * the platform doesn't have the shared-LLC, then the GPU will
> +      * effectively skip invalidating the cache(or however that works
> +      * internally) when writing the new value.  This is really bad since the
> +      * GPU has just written some new data to main memory, but the CPU cache
> +      * is still valid and now contains stale data. As a result the next time
> +      * we do a cached read with the CPU, we are rewarded with stale data.
> +      * Likewise if the cache is later flushed, we might be rewarded with
> +      * overwriting main memory with stale data.
> +      *
> +      * I915_BO_CACHE_COHERENT_FOR_WRITE:
> +      *
> +      * When writing through the CPU cache, the GPU is still coherent. Note
> +      * that this also implies I915_BO_CACHE_COHERENT_FOR_READ.
> +      *
> +      * This is never set when I915_CACHE_NONE is used for @cache_level,
> +      * where instead we have to manually flush the caches after writing
> +      * through the CPU cache. For other cache levels this should be set and
> +      * the object is therefore considered coherent for both reads and writes
> +      * through the CPU cache.
I don't remember why we have this read vs. write split and this new
documentation doesn't seem to really explain it either.
Hmm, I attempted to explain that earlier:

Note that on platforms with shared-LLC support(HAS_LLC) reads through
the CPU cache are always coherent, regardless of the @cache_level. On
snooping based platforms this is not the case, unless the full
I915_CACHE_LLC or similar setting is used.

As a result of this we need to track coherency separately for reads
and writes, in order to avoid superfluous flushing on shared-LLC
platforms, for reads.

So AFAIK it's just because shared-LLC can be coherent for reads, while
also not being coherent for writes(CACHE_NONE),
CPU vs. GPU is fully coherent when it comes to LLC. Or at least I've
never heard of any mechanism that would make it only partially coherent.
What do you mean by "comes to LLC", are you talking about HAS_LLC() or
I915_CACHE_LLC?
I'm talking about the actual cache.
...
If you set I915_CACHE_LLC, then yes it is fully coherent for both
HAS_LLC() and HAS_SNOOP().
If you set I915_CACHE_NONE, then reads are still coherent on
HAS_LLC(),
Reads and writes both. The only thing that's not coherent is the
display engine.
There's a lot of code which seems to disagree,
Can't even imagine why anyone would make a cache coherency protocol
that only handles reads but not writes...
...
plus there's now this new
MOCS thing.
That's just a full LLC bypass AFAICS. Can't omit invalidates if
you use that one or you'll just get stale data from the cache
on reads as well.
...
I really hope we don't have all those cache coherency bits
just because the code complexity is entertaining?
They were definitely added to fix a display issue, and before
that it was just a single flag, which wasn't doing what the display
needed. I think before the flag was added we used some other indicators
to check when we need to clflush, or maybe we did a some extra pointless
clflushes here and there and the broken single flag was supposed to
avoid those. Not quite sure.
I suppose these two flags should maybe have been named more like
"needs invalidate" and "needs writeback" to make it clear what 
one needs to do.
-- 
Ville Syrjälä
Intel

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits