[PATCH 1/3] drm/amdgpu: Add userptr bo support for mGPUs when iommu is on

Wed Apr 12 14:29:28 UTC 2023

Am 2023-04-12 um 09:35 schrieb Philip Yang:
>
> On 2023-04-04 09:45, Felix Kuehling wrote:
>> [+Philip]
>>
>> Am 2023-04-04 um 08:47 schrieb Christian König:
>>
>>> Am 04.04.23 um 12:56 schrieb Xiao, Shane:
>>>> [AMD Official Use Only - General]
>>>>
>>>>> -----Original Message-----
>>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>>>> Sent: Tuesday, April 4, 2023 6:27 PM
>>>>> To: Xiao, Shane <shane.xiao at amd.com>; amd-gfx at lists.freedesktop.org;
>>>>> Kuehling, Felix <Felix.Kuehling at amd.com>
>>>>> Cc: Liu, Aaron <Aaron.Liu at amd.com>; Guo, Shikai <Shikai.Guo at amd.com>
>>>>> Subject: Re: [PATCH 1/3] drm/amdgpu: Add userptr bo support for mGPUs
>>>>> when iommu is on
>>>>>
>>>>> Am 04.04.23 um 11:56 schrieb Shane Xiao:
>>>>>> For userptr bo with iommu on, multiple GPUs use same system 
>>>>>> memory dma
>>>>>> mapping address when both bo_adev and adev in identity mode or in 
>>>>>> the
>>>>>> same iommu group.
>>>> Hi Christian,
>>>>
>>>>> WTF? Userptr BOs are not allowed to be exported/imported between 
>>>>> different
>>>>> GPUs.
>>>>>
>>>>> So how can the same userptr BO be used on different GPUs?
>>>> If GPUs are all in iommu identity mode which means dma address are 
>>>> the same as physical address,  all of the GPUs can see the system 
>>>> memory directly.
>>>>
>>>> In such case, should we export/import the BO,  then create a new SG 
>>>> BO for another GPU?
>>>
>>> Yes, absolutely. Each userptr BO is only meant to be used on one GPU.
>>>
>>> Even if you could use the same BO for multiple GPUs it's not 
>>> necessary a good idea since you then have live time problems for 
>>> example.
>>>
>>> E.g. what happens when one GPU is hot removed while the other one 
>>> who imported the BO is still in use?
>>>
>>> Felix can you comment on that? My recollection was that we rather 
>>> improve the storage of DMA addresses instead of duplicating the BOs 
>>> over different GPUs.
>>
>> For KFD we currently enable sharing of userptr BOs between multiple 
>> GPUs by creating a userptr BO for the first GPU, and creating 
>> additional SG BOs using the same page list for additional GPUs. That 
>> way we don't have to call hmm_range_fault N times and setup N MMU 
>> notifiers for the same address range on an N GPU system. But we do 
>> have to create N DMA mappings, which is facilitated by the SG BOs.
>>
>> We have a further optimization to not even store separate DMA 
>> addresses per-GPU if they are a direct mapping. In that case we just 
>> increase the reference count on the original userptr BO. (I agree 
>> that we should also look into more efficient storage of DMA 
>> addresses. However, last time we talked about this, you basically 
>> told us that scatter gather tables are being deprecated, but I 
>> haven't seen the alternative yet.)
>>
>> I think this patch fixes a potential issue with that optimization. 
>> There is an implicit assumption, that the DMA addresses stored in the 
>> original userptr BO are a direct mapping, so we can reuse them on 
>> other GPUs that also use a direct mapping. But, we didn't actually 
>> check that assumption. I think this patch fixes that for systems 
>> where system memory is direct mapped on some but not all GPUs.
>
> We will need similar optimization in svm_range_dma_map to reuse the 
> prange->dma_addr for GPUs on same iommu group or direct mapping.

Good point. Optimizing for direct mapping is easy: Just skip the dmamap. 
Reusing the same dmamapping for multiple GPUs in the same IOMMU group is 
a bit harder. First you need to identify all the GPUs that are in the 
same group and then check if one of them already has a DMA mapping that 
can be reused. I wonder how common this is, and whether it's worth the 
trouble.

Regards,
   Felix

>
> Regards,
>
> Philip
>
>>
>> This scenario should probably be called out explicitly in the patch 
>> description. The condition is also getting pretty hard to read and 
>> understand. Maybe move the both-direct-map-or-same-iommu-group 
>> conditions into a helper function, say 
>> "amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm) && reuse_dmamap(adev, 
>> bo_adev)".
>>
>> Regards,
>>   Felix
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>>
>>>> Best Regards,
>>>> Shane
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Signed-off-by: Shane Xiao <shane.xiao at amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 ++++----
>>>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>> index e7403f8e4eba..33cda358cb9e 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>> @@ -804,11 +804,11 @@ static int kfd_mem_attach(struct amdgpu_device
>>>>> *adev, struct kgd_mem *mem,
>>>>>>                 va + bo_size, vm);
>>>>>>
>>>>>>            if ((adev == bo_adev && !(mem->alloc_flags &
>>>>> KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) ||
>>>>>> - (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm) &&
>>>>> adev->ram_is_direct_mapped) ||
>>>>>> -            same_hive) {
>>>>>> + (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm) &&
>>>>> ((adev->ram_is_direct_mapped && bo_adev->ram_is_direct_mapped) ||
>>>>>> + adev->dev->iommu_group == bo_adev->dev-
>>>>>> iommu_group)) ||
>>>>>> +same_hive){
>>>>>>                /* Mappings on the local GPU, or VRAM mappings in
>>>>> the
>>>>>> -             * local hive, or userptr mapping IOMMU direct map
>>>>> mode
>>>>>> -             * share the original BO
>>>>>> +             * local hive, or userptr mapping in the same dma
>>>>>> +             * address space share the original BO
>>>>>>                 */
>>>>>>                attachment[i]->type = KFD_MEM_ATT_SHARED;
>>>>>>                bo[i] = mem->bo;
>>>