Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits

13 Jul 2021

Matthew Auld matthew.auld@intel.com writes:
...
Try to document the object caching related bits, like cache_coherent and
cache_dirty.
Suggested-by: Daniel Vetter daniel.vetter@ffwll.ch
Signed-off-by: Matthew Auld matthew.auld@intel.com

.../gpu/drm/i915/gem/i915_gem_object_types.h  | 135 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_drv.h               |   9 --
 2 files changed, 131 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index ef3de2ae9723..02c3529b774c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -92,6 +92,57 @@ struct drm_i915_gem_object_ops {
   const char *name; /* friendly name for debug, e.g. lockdep classes */
 };
+/**


enum i915_cache_level - The supported GTT caching values for system memory



pages.







These translate to some special GTT PTE bits when binding pages into some



address space. It also determines whether an object, or rather its pages are



coherent with the GPU, when also reading or writing through the CPU cache



with those pages.







Userspace can also control this through struct drm_i915_gem_caching.


*/

+enum i915_cache_level {

/**
* @I915_CACHE_NONE:


*


* Not coherent with the CPU cache. If the cache is dirty and we need


* the underlying pages to be coherent with some later GPU access then


* we need to manually flush the pages.


*


* Note that on shared-LLC platforms reads through the CPU cache are


* still coherent even with this setting. See also


* I915_BO_CACHE_COHERENT_FOR_READ for more details.


*/


I915_CACHE_NONE = 0,
/**
* @I915_CACHE_LLC:


*


* Coherent with the CPU cache. If the cache is dirty, then the GPU will


* ensure that access remains coherent, when both reading and writing


* through the CPU cache.


*


* Applies to both platforms with shared-LLC(HAS_LLC), and snooping


* based platforms(HAS_SNOOP).


*/


I915_CACHE_LLC,
/**
* @I915_CACHE_L3_LLC:


*


* gen7+, L3 sits between the domain specifc caches, eg sampler/render



typo: specifc
...

* caches, and the large Last-Level-Cache. LLC is coherent with the CPU,


* but L3 is only visible to the GPU.


*/



I dont get the difference between this and I915_CACHE_LLC.
Could the diff between LLC and L3_LLC be described here with example?
Thanks,
-Mika
...

I915_CACHE_L3_LLC,
/**
* @I915_CACHE_WT:


*


* hsw:gt3e Write-through for scanout buffers.


*/


I915_CACHE_WT,

+};



enum i915_map_type {
   I915_MAP_WB = 0,
   I915_MAP_WC,
@@ -228,14 +279,90 @@ struct drm_i915_gem_object {
   unsigned int mem_flags;
 #define I915_BO_FLAG_STRUCT_PAGE BIT(0) /* Object backed by struct pages */
 #define I915_BO_FLAG_IOMEM       BIT(1) /* Object backed by IO memory */

/*
* Is the object to be mapped as read-only to the GPU


* Only honoured if hardware has relevant pte bit




/**
* @cache_level: The desired GTT caching level.


*


* See enum i915_cache_level for possible values, along with what


* each does.

*/
 unsigned int cache_level:3;


unsigned int cache_coherent:2;


/**
* @cache_coherent:


*


* Track whether the pages are coherent with the GPU if reading or


* writing through the CPU cache.


*


* This largely depends on the @cache_level, for example if the object


* is marked as I915_CACHE_LLC, then GPU access is coherent for both


* reads and writes through the CPU cache.


*


* Note that on platforms with shared-LLC support(HAS_LLC) reads through


* the CPU cache are always coherent, regardless of the @cache_level. On


* snooping based platforms this is not the case, unless the full


* I915_CACHE_LLC or similar setting is used.


*


* As a result of this we need to track coherency separately for reads


* and writes, in order to avoid superfluous flushing on shared-LLC


* platforms, for reads.


*


* I915_BO_CACHE_COHERENT_FOR_READ:


*


* When reading through the CPU cache, the GPU is still coherent. Note


* that no data has actually been modified here, so it might seem


* strange that we care about this.


*


* As an example, if some object is mapped on the CPU with write-back


* caching, and we read some page, then the cache likely now contains


* the data from that read. At this point the cache and main memory


* match up, so all good. But next the GPU needs to write some data to


* that same page. Now if the @cache_level is I915_CACHE_NONE and the


* the platform doesn't have the shared-LLC, then the GPU will


* effectively skip invalidating the cache(or however that works


* internally) when writing the new value.  This is really bad since the


* GPU has just written some new data to main memory, but the CPU cache


* is still valid and now contains stale data. As a result the next time


* we do a cached read with the CPU, we are rewarded with stale data.


* Likewise if the cache is later flushed, we might be rewarded with


* overwriting main memory with stale data.


*


* I915_BO_CACHE_COHERENT_FOR_WRITE:


*


* When writing through the CPU cache, the GPU is still coherent. Note


* that this also implies I915_BO_CACHE_COHERENT_FOR_READ.


*


* This is never set when I915_CACHE_NONE is used for @cache_level,


* where instead we have to manually flush the caches after writing


* through the CPU cache. For other cache levels this should be set and


* the object is therefore considered coherent for both reads and writes


* through the CPU cache.


*/



#define I915_BO_CACHE_COHERENT_FOR_READ BIT(0)
 #define I915_BO_CACHE_COHERENT_FOR_WRITE BIT(1)

unsigned int cache_coherent:2;

/**

* @cache_dirty:


*


* Track if the cache might be dirty for the @pages i.e it has yet to be


* written back to main memory. As a result reading directly from main


* memory might yield stale data.


*


* This also ties into whether the kernel is tracking the object as


* coherent with the GPU, as per @cache_coherent, as it determines if


* flushing might be needed at various points.


*


* Another part of @cache_dirty is managing flushing when first


* acquiring the pages for system memory, at this point the pages are


* considered foreign, so the default assumption is that the cache is


* dirty, for example the page zeroing done my the kernel might leave


* writes though the CPU cache, or swapping-in, while the actual data in


* main memory is potentially stale.  Note that this is a potential


* security issue when dealing with userspace objects and zeroing. Now,


* whether we actually need apply the big sledgehammer of flushing all


* the pages on acquire depends on if @cache_coherent is marked as


* I915_BO_CACHE_COHERENT_FOR_WRITE, i.e that the GPU will be coherent


* for both reads and writes though the CPU cache. So pretty much this


* should only be needed for I915_CACHE_NONE objects.


*/

unsigned int cache_dirty:1;
/**


diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c4747f4407ef..37bb1a3cadd4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -394,15 +394,6 @@ struct drm_i915_display_funcs {
   void (*read_luts)(struct intel_crtc_state *crtc_state);
 };
-enum i915_cache_level {

I915_CACHE_NONE = 0,
I915_CACHE_LLC, /* also used for snoopable memory on non-LLC */
I915_CACHE_L3_LLC, /* gen7+, L3 sits between the domain specifc
	      caches, eg sampler/render caches, and the


	      large Last-Level-Cache. LLC is coherent with


	      the CPU, but L3 is only visible to the GPU. */


I915_CACHE_WT, /* hsw:gt3e WriteThrough for scanouts */

-};
#define I915_COLOR_UNEVICTABLE (-1) /* a non-vma sharing the address space */
-- 
2.26.3

Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [Intel-gfx] [PATCH 1/5] drm/i915: document caching related bits