[Intel-xe] FW: [PATCH v9 0/3] PAT and cache coherency support

Matthew Auld matthew.auld at intel.com
Mon Nov 13 10:42:38 UTC 2023


On 13/11/2023 09:50, Mrozek, Michal wrote:
> But then we leave performance on the table:
> - if we map on CPU as uncached, the allocation is slow on the CPU side
> - if we map coherent on the GPU side, the allocation is slow on the GPU side
> 
> That's why OpenCL has clear synchronization points map/unmap buffer , that tells the driver where transition is coming from.
> 
> What kind of security is being compromised with this mode ?

When the KMD allocates some system memory and gives it to userspace it 
must always be zeroed first for security reasons. The core-kernel 
ensures the memory is zeroed, but this can leave the CPU caches dirty 
(like when allocated as WB). You also have things like like swap-in, 
where you also allocate system memory, but in this case we don't need to 
zero it and rather just use the CPU to copy the contents from swap back 
into usable system memory. But again CPU caches can be dirty here. 
Malicious user can be devious here and bind the memory as incoherent on 
the GPU, bypassing the CPU caches and potentially peek at the previous 
page contents.

In upstream i915, this is worked around by doing a clflush() of the 
pages for every newly allocated system memory object (unless it's dgpu 
or platform with shared-llc, since such platforms should always be 
coherent). There is also an additional clflush() during swap-in. Also 
every time you initially use some imported dma-buf there is a big nasty 
WBINVD, since it's slightly iffy on how you are meant to get the CPU 
address of the imported memory to be able to pass into clflush(). Also 
this only really works on x86, so you have to hide all this stuff when 
building on non-x86, which is now quite possible with dgpu.

The hope with Xe was to avoid all of that.

> The memory visibility is within a process , so there is no way to get anything from external processes.
> 
> Michal
> 
> 
> -----Original Message-----
> From: Auld, Matthew <matthew.auld at intel.com>
> Sent: Monday, November 13, 2023 10:38 AM
> To: Mrozek, Michal <michal.mrozek at intel.com>; Milczarek, Slawomir <slawomir.milczarek at intel.com>; Tomasz Mistat <tomasz.mistat at linux.intel.com>; intel-xe at lists.freedesktop.org; Arteaga Molina, Jaime A <jaime.a.arteaga.molina at intel.com>; Cetnerowski, Adam <adam.cetnerowski at intel.com>; Chodor, Jaroslaw <jaroslaw.chodor at intel.com>; Dunajski, Bartosz <bartosz.dunajski at intel.com>; Harasimiuk, Artur <artur.harasimiuk at intel.com>; Hoppe, Mateusz <mateusz.hoppe at intel.com>; Jablonski, Mateusz <mateusz.jablonski at intel.com>; Jobczyk, Lukasz <lukasz.jobczyk at intel.com>; Kasper, Kacper K <kacper.k.kasper at intel.com>; Rozenfeld, Piotr <piotr.rozenfeld at intel.com>; Wilma, Pawel <pawel.wilma at intel.com>; Wolny, Bartlomiej <bartlomiej.wolny at intel.com>; Yates, Brandon <brandon.yates at intel.com>; Zdanowicz, Zbigniew <zbigniew.zdanowicz at intel.com>
> Subject: Re: FW: [Intel-xe] [PATCH v9 0/3] PAT and cache coherency support
> 
> On 13/11/2023 09:11, Mrozek, Michal wrote:
>> "
>> Right, if you allocate WB memory (cpu_caching=wb), then you can't currently use an incoherent PAT index if accessing it from the GPU. KMD will reject it at vm_bind."
>> "
>>
>> What about scenarios where I want to have:
>> - CPU WB caching
>> - GPU non coherent access
>>
>> And manually flush cache lines on synchronization points (i.e. clFlush on mapped OpenCL buffers) ?
>> How to do it if KMD would reject VM_BIND operations?
> 
> Currently it is not possible. The hope was to avoid the need for any kind of manual clflushing/wbinvd in the KMD, which would be needed to avoid potential security issues, if we start allowing incoherent GPU access to CPU cached memory.
> 
>>
>> Michal
>>
>> -----Original Message-----
>> From: Auld, Matthew <matthew.auld at intel.com>
>> Sent: Thursday, November 9, 2023 5:08 PM
>> To: Mrozek, Michal <michal.mrozek at intel.com>; Milczarek, Slawomir <slawomir.milczarek at intel.com>; Tomasz Mistat <tomasz.mistat at linux.intel.com>; intel-xe at lists.freedesktop.org; Arteaga Molina, Jaime A <jaime.a.arteaga.molina at intel.com>; Cetnerowski, Adam <adam.cetnerowski at intel.com>; Chodor, Jaroslaw <jaroslaw.chodor at intel.com>; Dunajski, Bartosz <bartosz.dunajski at intel.com>; Harasimiuk, Artur <artur.harasimiuk at intel.com>; Hoppe, Mateusz <mateusz.hoppe at intel.com>; Jablonski, Mateusz <mateusz.jablonski at intel.com>; Jobczyk, Lukasz <lukasz.jobczyk at intel.com>; Kasper, Kacper K <kacper.k.kasper at intel.com>; Rozenfeld, Piotr <piotr.rozenfeld at intel.com>; Wilma, Pawel <pawel.wilma at intel.com>; Wolny, Bartlomiej <bartlomiej.wolny at intel.com>; Yates, Brandon <brandon.yates at intel.com>; Zdanowicz, Zbigniew <zbigniew.zdanowicz at intel.com>
>> Subject: Re: FW: [Intel-xe] [PATCH v9 0/3] PAT and cache coherency support
>>
>> On 09/11/2023 15:24, Mrozek, Michal wrote:
>>> Will KMD fail the mmap call if we provide incorrect mapping ?
>>
>> With Xe there are no mmap flags or anything like that, so you can't really provide an incorrect mapping. Ignoring this series, the previous behaviour was for the KMD to implicitly select the cpu caching mode when allocating the memory, where roughly you would get WC for VRAM and then WB for system memory, unless it's used for scanout in which case you get WC. With this series userspace must now explicitly request the cpu caching mode at allocation via gem_create, which will be used for all CPU access, including KMD internal stuff as well as userspace mmap().
>>
>>> I.e. WB mmap on non-coherent GPU resource?
>>
>> Right, if you allocate WB memory (cpu_caching=wb), then you can't currently use an incoherent PAT index if accessing it from the GPU. KMD will reject it at vm_bind.
>>
>>>
>>> Michal
>>>
>>> -----Original Message-----
>>> From: Auld, Matthew <matthew.auld at intel.com>
>>> Sent: Thursday, November 9, 2023 4:05 PM
>>> To: Milczarek, Slawomir <slawomir.milczarek at intel.com>; Tomasz Mistat
>>> <tomasz.mistat at linux.intel.com>; intel-xe at lists.freedesktop.org;
>>> Arteaga Molina, Jaime A <jaime.a.arteaga.molina at intel.com>;
>>> Cetnerowski, Adam <adam.cetnerowski at intel.com>; Chodor, Jaroslaw
>>> <jaroslaw.chodor at intel.com>; Dunajski, Bartosz
>>> <bartosz.dunajski at intel.com>; Harasimiuk, Artur
>>> <artur.harasimiuk at intel.com>; Hoppe, Mateusz
>>> <mateusz.hoppe at intel.com>; Jablonski, Mateusz
>>> <mateusz.jablonski at intel.com>; Jobczyk, Lukasz
>>> <lukasz.jobczyk at intel.com>; Kasper, Kacper K
>>> <kacper.k.kasper at intel.com>; Mrozek, Michal <michal.mrozek at intel.com>;
>>> Rozenfeld, Piotr <piotr.rozenfeld at intel.com>; Wilma, Pawel
>>> <pawel.wilma at intel.com>; Wolny, Bartlomiej
>>> <bartlomiej.wolny at intel.com>; Yates, Brandon
>>> <brandon.yates at intel.com>; Zdanowicz, Zbigniew
>>> <zbigniew.zdanowicz at intel.com>
>>> Subject: Re: FW: [Intel-xe] [PATCH v9 0/3] PAT and cache coherency
>>> support
>>>
>>> On 09/11/2023 14:58, Milczarek, Slawomir wrote:
>>>> Hi Matthew,
>>>>
>>>> In the compute driver we call GEM_MMAP_OFFSET with flags to specify cpu cache mode (wb/wc):
>>>> GEM_CREATE_EXT() and mmap() + GEM_MMAP_OFFSET(flags=wb/wc)
>>>>
>>>> Starting from Xe KMD we should be able to achieve the same CPU caching policies with GEM_CREATE_EXT:
>>>> GEM_CREATE_EXT(flags=wb/wc) and mmap() + GEM_MMAP_OFFSET(flags=0) Am
>>>> I correct?
>>>
>>> Yes, that's correct.
>>>
>>>>
>>>> Thanks,
>>>> Slawek
>>>>
>>>> -----Original Message-----
>>>> From: Auld, Matthew <matthew.auld at intel.com>
>>>> Sent: Wednesday, November 8, 2023 1:04 PM
>>>> To: Tomasz Mistat <tomasz.mistat at linux.intel.com>;
>>>> intel-xe at lists.freedesktop.org; Arteaga Molina, Jaime A
>>>> <Jaime.A.Arteaga.Molina at intel.com>; Cetnerowski, Adam
>>>> <adam.cetnerowski at intel.com>; Chodor, Jaroslaw
>>>> <jaroslaw.chodor at intel.com>; Dunajski, Bartosz
>>>> <bartosz.dunajski at intel.com>; Harasimiuk, Artur
>>>> <artur.harasimiuk at intel.com>; Hoppe, Mateusz
>>>> <mateusz.hoppe at intel.com>; Jablonski, Mateusz
>>>> <Mateusz.Jablonski at intel.com>; Jobczyk, Lukasz
>>>> <lukasz.jobczyk at intel.com>; Kasper, Kacper K
>>>> <kacper.k.kasper at intel.com>; Milczarek, Slawomir
>>>> <slawomir.milczarek at intel.com>; Mrozek, Michal
>>>> <michal.mrozek at intel.com>; Rozenfeld, Piotr
>>>> <piotr.rozenfeld at intel.com>; Wilma, Pawel <pawel.wilma at intel.com>;
>>>> Wolny, Bartlomiej <bartlomiej.wolny at intel.com>; Yates, Brandon
>>>> <brandon.yates at intel.com>; Zdanowicz, Zbigniew
>>>> <zbigniew.zdanowicz at intel.com>; Dunajski, Bartosz
>>>> <bartosz.dunajski at intel.com>
>>>> Subject: Re: FW: [Intel-xe] [PATCH v9 0/3] PAT and cache coherency
>>>> support
>>>>
>>>> On 08/11/2023 11:30, Tomasz Mistat wrote:
>>>>> compute-umd_uapi_review_team at intel.com,bartosz.dunajski at intel.com,sa
>>>>> u
>>>>> r
>>>>> abhg.gupta at intel.com,ulisses.furquim at intel.com,michal.mrozek at intel.c
>>>>> o
>>>>> m
>>>>> ,tomasz.mistat at intel.com,piotr.rozenfeld at intel.com,ayaz.siddiqui at int
>>>>> e
>>>>> l
>>>>> .com,luis.strano at intel.com,filip.hazubski at intel.com,filip.hazubski at i
>>>>> n t el.com,mateusz.naklicki at intel.com,kamil.kopryk at intel.com
>>>>> Bcc:
>>>>> Subject: Re: [Intel-xe] [PATCH v9 0/3] PAT and cache coherency
>>>>> support
>>>>> Reply-To:
>>>>> In-Reply-To: <20231103154309.402159-5-matthew.auld at intel.com>
>>>>>
>>>>> On 2023-11-03 at 15:43:10 +0000, Matthew Auld wrote:
>>>>>> Branch available here:
>>>>>> https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_t
>>>>>> y
>>>>>> p
>>>>>> e=heads
>>>>>>
>>>>>> IGT changes:
>>>>>> https://patchwork.freedesktop.org/series/124667/
>>>>>>
>>>>>> Mesa:
>>>>>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25462
>>>>>>
>>>>>> Goal here is to allow userspace to directly control the pat_index
>>>>>> when mapping memory via the ppGTT, in addtion to the CPU caching
>>>>>> mode. This is very much needed on newer igpu platforms which allow
>>>>>> incoherent GT access, where the choice over the cache level and
>>>>>> expected coherency is best left to userspace depending on their
>>>>>> usecase.  In the future there may also be other stuff encoded in the pat_index, so giving userspace direct control will also be needed there.
>>>>>>
>>>>>> To support this we added new gem_create uAPI for selecting the CPU
>>>>>> cache mode to use for system memory, including the expected GPU
>>>>>> coherency mode. There are various restrictions here for the
>>>>>> selected coherency mode and compatible CPU cache modes.  With that
>>>>>> in place the actual pat_index can now be provided as part of
>>>>>> vm_bind. The only restriction is that the coherency mode of the
>>>>>> pat_index must be at least as coherent as the gem_create coherency mode. There are also some special cases like with userptr and dma-buf.
>>>>>>
>>>>>> v2:
>>>>>>        - Loads of improvements/tweaks. Main changes are to now allow
>>>>>>          gem_create.coh_mode <= coh_mode(pat_index), rather than it needing to match
>>>>>>          exactly. This simplifies the dma-buf policy from userspace pov. Also we now
>>>>>>          only consider COH_NONE and COH_AT_LEAST_1WAY.
>>>>>> v3:
>>>>>>        - Rebase. Split the pte_encode() refactoring, plus various smaller tweaks and
>>>>>>          fixes.
>>>>>> v4:
>>>>>>        - Rebase on Lucas' new series.
>>>>>>        - Drop UC cache mode.
>>>>>>        - s/smem_cpu_caching/cpu_caching/. Idea is to make VRAM WC explicit in the
>>>>>>          uapi, plus make it more future proof.
>>>>>> v5:
>>>>>>        - Rebase, plus some small tweaks and fixes.
>>>>>> v6:
>>>>>>        - CI hooks fixes + checkpatch.
>>>>>> v7:
>>>>>>        - Some small tweaks
>>>>>> v8:
>>>>>>        - Rebase on Xe2 PAT table additions.
>>>>>> v9:
>>>>>>        - Drop coh_mode.
>>>>>>
>>>>>> --
>>>>>> 2.41.0
>>>>>>
>>>>>
>>>>> I'm sharing down here questions passed by bartosz.dunajski at intel.com
>>>>> lets continue review here at respective ML
>>>>>
>>>>> Sent: Wednesday, November 8, 2023 09:05
>>>>> Subject: RE: [Intel-xe] [PATCH v9 0/3] PAT and cache coherency
>>>>> support
>>>>>
>>>>> Just to confirm my understanding. We will have 2 separate APIs?
>>>>> 1.	For gem_create to explicitly set coherency/caching according to API enums
>>>>
>>>> Yeah, you just need to provide the cpu_caching mode (wc or wb). If you later mmap() the object this same caching mode is used. Also any KMD internal things like swap-in will also respect this caching mode. The coh_mode is now removed.
>>>>
>>>>> 2.	For vm_bind to set PAT index that is aligned with Bspec and controls more than #1. It can also override coherency/caching that was previously set?
>>>>
>>>> Right, as part of vm_bind you provide whatever PAT index control (as
>>>> per
>>>> Bspec) you want for each mapping of the object (you could map it multiple times).
>>>>
>>>> When choosing a PAT index the cpu_caching only matters if you choose an incoherent PAT index (coh_none in Bspec), since we don't want to allow cpu_caching=wb and coh_none, since that leads to some complications in the KMD.
>>>>
>>>>>
>>>>>


More information about the Intel-xe mailing list