[PATCH v3] drm/amdkfd: fix dma mapping leaking warning
Felix Kuehling
felix.kuehling at amd.com
Fri Sep 17 19:59:24 UTC 2021
Can you also add a Fixes: tag to make it easier to identify stable
branches for backports? I think it's this commit:
commit 1d5dbfe6c06a5269b535f8e6b13569f32c42ea60
Author: Alex Sierra <alex.sierra at amd.com>
Date: Wed May 5 14:15:50 2021 -0500
drm/amdkfd: classify and map mixed svm range pages in GPU
[Why]
svm ranges can have mixed pages from device or system memory.
A good example is, after a prange has been allocated in VRAM and a
copy-on-write is triggered by a fork. This invalidates some pages
inside the prange. Endding up in mixed pages.
[How]
By classifying each page inside a prange, based on its type. Device or
system memory, during dma mapping call. If page corresponds
to VRAM domain, a flag is set to its dma_addr entry for each GPU.
Then, at the GPU page table mapping. All group of contiguous pages
within
the same type are mapped with their proper pte flags.
v2:
Instead of using ttm_res to calculate vram pfns in the svm_range.
It is now
done by setting the vram real physical address into drm_addr array.
This makes more flexible VRAM management, plus removes the need to have
a BO reference in the svm_range.
v3:
Remove mapping member from svm_range
Signed-off-by: Alex Sierra <alex.sierra at amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
Thanks,
Felix
On 2021-09-17 3:52 p.m., Felix Kuehling wrote:
> On 2021-09-17 2:04 p.m., Philip Yang wrote:
>> For xnack off, restore work dma unmap previous system memory page, and
>> dma map the updated system memory page to update GPU mapping, this is
>> not dma mapping leaking, remove the WARN_ONCE for dma mapping leaking.
>>
>> prange->dma_addr store the VRAM page pfn after the range migrated to
>> VRAM, should not dma unmap VRAM page when updating GPU mapping or
>> remove prange. Add helper svm_is_valid_dma_mapping_addr to check VRAM
>> page and error cases.
>>
>> Mask out SVM_RANGE_VRAM_DOMAIN flag in dma_addr before calling amdgpu vm
>> update to avoid BUG_ON(*addr & 0xFFFF00000000003FULL), and set it again
>> immediately after. This flag is used to know the type of page later to
>> dma unmapping system memory page.
>>
>> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 18 ++++++++++++++----
>> 1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index 57318edc4020..e47f1219ad84 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -118,6 +118,13 @@ static void svm_range_remove_notifier(struct
>> svm_range *prange)
>> mmu_interval_notifier_remove(&prange->notifier);
>> }
>> +static bool
>> +svm_is_valid_dma_mapping_addr(struct device *dev, dma_addr_t dma_addr)
>> +{
>> + return dma_addr && !dma_mapping_error(dev, dma_addr) &&
>> + !(dma_addr & SVM_RANGE_VRAM_DOMAIN);
>> +}
>> +
>> static int
>> svm_range_dma_map_dev(struct amdgpu_device *adev, struct svm_range
>> *prange,
>> unsigned long offset, unsigned long npages,
>> @@ -139,8 +146,7 @@ svm_range_dma_map_dev(struct amdgpu_device *adev,
>> struct svm_range *prange,
>> addr += offset;
>> for (i = 0; i < npages; i++) {
>> - if (WARN_ONCE(addr[i] && !dma_mapping_error(dev, addr[i]),
>> - "leaking dma mapping\n"))
>> + if (svm_is_valid_dma_mapping_addr(dev, addr[i]))
>> dma_unmap_page(dev, addr[i], PAGE_SIZE, dir);
>> page = hmm_pfn_to_page(hmm_pfns[i]);
>> @@ -209,7 +215,7 @@ void svm_range_dma_unmap(struct device *dev,
>> dma_addr_t *dma_addr,
>> return;
>> for (i = offset; i < offset + npages; i++) {
>> - if (!dma_addr[i] || dma_mapping_error(dev, dma_addr[i]))
>> + if (!svm_is_valid_dma_mapping_addr(dev, dma_addr[i]))
>> continue;
>> pr_debug("dma unmapping 0x%llx\n", dma_addr[i] >> PAGE_SHIFT);
>> dma_unmap_page(dev, dma_addr[i], PAGE_SIZE, dir);
>> @@ -1165,7 +1171,7 @@ svm_range_map_to_gpu(struct amdgpu_device
>> *adev, struct amdgpu_vm *vm,
>> unsigned long last_start;
>> int last_domain;
>> int r = 0;
>> - int64_t i;
>> + int64_t i, j;
>> last_start = prange->start + offset;
>> @@ -1205,6 +1211,10 @@ svm_range_map_to_gpu(struct amdgpu_device
>> *adev, struct amdgpu_vm *vm,
>> NULL, dma_addr,
>> &vm->last_update,
>> &table_freed);
>> +
>> + for (j = last_start; j <= prange->start + i; j++)
>> + dma_addr[j - prange->start] |= last_domain;
>
> This would be slightly more readable like this:
>
> for (j = last_start - prange->start; j <= i; j++)
> dma_addr[j] |= last_domain;
>
> Other than that, the patch is
>
> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>
>
>> +
>> if (r) {
>> pr_debug("failed %d to map to gpu 0x%lx\n", r,
>> prange->start);
>> goto out;
More information about the amd-gfx
mailing list