[Intel-xe] LLC configurating, mmap and bo cache management questions

Wed Dec 6 08:26:57 UTC 2023

On Tue, 2023-12-05 at 14:19 +0000, Tvrtko Ursulin wrote:
> 
> Hi,
> 
> We are working on adding xe support to ChromeOS minigbm and have a 
> couple questions.
> 
> If I follow things correctly with xe mmap caching mode is fixed to 
> object caching modes set at bo create. For framebuffers it will be WC
> and for the rest userspace can choose WB or WC via 
> drm_xe_gem_create->cpu_caching. (Unless discrete, when WB cannot be
> used 
> at all.)
> 
> AFAICT minigbm basically cares about two transition points. Lets call
> them CPU access begin and end.
> 
> 1)
> When a bo is mmapped it wants to invalidate the cache, which looks to
> be 
> about making sure all GPU writes have landed to the backing store. In
> the i915 world that translates to the set_domain ioctl.
> 
> What is the uapi for this with xe, or it is somehow guaranteed to not
> be 
> needed?

Signalling a user-fence or dma-fence obtained as an out-fence from an
exec call will guarantee GPU caches are flushed. Currently I don't
think there is anything like gem wait in the uAPI, although Matt is
just about to add functionality to wait on all outstanding work on an
exec_queue.

> 
> 2)
> When a bo is unmapped, or CPU access finished, it wants to flush the
> CPU 
> caches. That is /almost/ completely a CPU operation, where it just
> needs 
> to either clflush or invalidate the WC buffer respectively, if not
> the 
> fact that clflush can be skipped on platforms with LLC.
> 
> I did not see an equivalent of an I915_PARAM_HAS_LLC in xe? Did I
> miss 
> it or what it is the plan for querying this detail?

XeKMD is generally coherent, except if UMD selects a GPU PAT index with
limited coherency together with WB instead of WC memory. In that case,
UMD is responsible for doing the needed CLFLUSH-ing, whereas KMD only
ensures initial clearing of the pages is CLFLUSHED for security
reasons.

I'm not 100% sure if UMD can actually select WB with limited coherency
PAT index in the initial uAPI revision, but Matthew has received
requests for that so any additional input here on performance
implications is appreciated.

The thinking here is otherwise that GPU PAT indices with limited
coherency should be used together with WC memory in the same situations
as VRAM/LMEM is used on DGFX.

/Thomas

> 
> Regards,
> 
> Tvrtko