[Intel-xe] [PATCH DONTMERGE] drm/xe: uapi review submission

Fri Jul 7 07:22:34 UTC 2023

Hi, Matt,

On 7/7/23 00:30, Matt Roper wrote:
> On Thu, Jul 06, 2023 at 08:14:52AM +0200, Thomas Hellström wrote:
>> Hi, Matt,
>>
>> Thanks for taking a look into this. Some comments below. I think we need to
>> set up tasks for the various review concerns and assign people to have them
>> resolved:
> ...
>>>> +struct drm_xe_vm_bind {
>>> Should there be a field in here to allow userspace to specify which PAT
>>> index corresponds to the behavior (caching, coherency, CLOS, etc.) they
>>> want on this binding?  The PAT should be a characteristic of the bind
>>> rather than of the underlying object, right?
>> Yes, given certain immutable restrictions given at buffer-object create
>> time; For example the CPU caching mode will restrict the coherency mode used
>> in PAT, and CPU caching mode needs to be specified at bo create time. The
>> current suggestion is therefore that coherency mode also needs to be
>> specified at bo create time, a slightly stricter limitation on the available
>> PAT indices at VM_BIND time. Whether we want to include PAT in the main
>> struct or as an extension hasn't been decided yet.
> I'm not sure I fully understand this one; I thought CPU caching to be
> mostly orthogonal to GPU coherency mode.  Do you mean we'd step in and
> prevent combinations like CPU:WB + GPU:non-coherent since then the GPU
> might not see updates done by the CPU that are still stuck in the CPU
> cache and/or the CPU might read stale data from its cache after the GPU
> has updated the buffer?  It seems like there are so many ways userspace
> could already shoot itself in the foot if it picked nonsense settings
> that I'm surprised we'd try to stop it here.  Even for the example given
> there might be some cases where it still winds up being fine (e.g., the
> CPU side always does clflushes at the necessary times and the GPU never
> does any writes that would need to invalidate the CPU cache).
>
> Am I overlooking something, or are the combinations that will get
> restricted based on something else entirely?

The reason we try to stop the GPU bypassing the CPU cache by choosing 
non-coherent GPU mappings with WB system memory is so that user-space 
doesn't find a way to bypass the page-wiping done by the kernel and can 
access previous sensitive page-contents. The intention is not really to 
stop user-space shooting itself in the foot. It's still free to do that.

There are three ways we can do this:

1) Kernel clflushes, like i915. (Works only for integrated, since it 
requires a guarantee that the processor never writes-back a clean 
(prefetched) cache-line). We only have that guarantee for Intel 
processors. Linus has made it clear that we should avoid relying on 
something arch- or processor dependent like this, but we might get away 
with it on integrated).

2) Write-combined system memory mappings, making the memory appear 
coherent anyway,  (Probably only works for x86) This actually moves the 
clflush to the WB->WC transition in the x86 arch layer of the kernel. I 
think amdgpu is using this.

3) We've been discussing GPU flushing of cpu-cache using a read with a 
suitable PAT setting  (The only way we can do this generically on 
discrete, I think, if discrete ever decides to implement non-coherent 
system page mappings. For now, we've been told no current or planned 
PCIe discrete will implement the PCIe no-snoop hint. Not sure what 
happens with CXL).

For uAPI purposes, we landed in 2)  (That would also allow us to base 
implementation on 1-3 if it turns out to be needed). WC system memory 
can only be choosen at backing store creation, which is a TTM restriction.

These were all part of the PAT presentations on Xe weekly sync.

Ofc if you think there are reasons we should reconsider, these are not 
fully set in stone yet

/Thomas

>
>
> Matt
>