[PATCH] drm/kfd: fix ttm_bo_release warning

Yu, Lang Lang.Yu at amd.com
Fri Sep 24 06:34:32 UTC 2021


[AMD Official Use Only]



>-----Original Message-----
>From: Koenig, Christian <Christian.Koenig at amd.com>
>Sent: Friday, September 24, 2021 1:54 PM
>To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>Cc: Kuehling, Felix <Felix.Kuehling at amd.com>; Huang, Ray
><Ray.Huang at amd.com>
>Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>
>
>Am 24.09.21 um 07:50 schrieb Yu, Lang:
>> [AMD Official Use Only]
>>> [SNIP]
>>>>>> Hi Christian,
>>>>>>
>>>>>> Thanks for your explanation and advice. I got your point.
>>>>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle.
>>>>>> I will try to add a flag into struct kgd_mem to indicate which BO
>>>>>> is pined and should be unpinned. Which will make
>>>>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks!
>>>>> That isn't to much better. The real solution would be to unpin them
>>>>> when the kfd process is destroyed.
>>>> Yes, will unpin them when the kfd process is destroyed.
>>>> But we should indicate which BO is pinned and should be unpinned. Right?
>>> Well not with a flag or something like that.
>>>
>>> The knowledge which BO is pinned and needs to be unpinned should come
>>> from the control logic and not be papered over by some general handling.
>>> That's the background why we have removed the general handling for
>>> this from TTM in the first place.
>>>
>>> In other words when you need to pin a BO because it is kmapped it
>>> should be unpinned when it is kunmapped and if you don't kunmap at
>>> all then there is something wrong with the code structure from a higher level
>point of view.
>> Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a
>> kmap, but without a kunmap when the kfd process is destroyed. The flag
>actually indicates kmap/kunmap.
>
>Well that's the wrong approach then. I mean you need to have the BO reference
>and the mapped pointer somewhere, don't you?
>
>How do you clean those up?

They are respectively cleaned by " kfd_process_device_free_bos " and " kfd_process_destroy_pdds".
Let me put the code here. Thanks!

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index ec028cf963f5..d65b3bf13fd8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -81,6 +81,7 @@ struct kgd_mem {

        bool aql_queue;
        bool is_imported;
+       bool is_mapped_to_kernel;
 };

 /* KFD Memory Eviction */
@@ -278,6 +279,8 @@ int amdgpu_amdkfd_gpuvm_sync_memory(
                struct kgd_dev *kgd, struct kgd_mem *mem, bool intr);
 int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
                struct kgd_mem *mem, void **kptr, uint64_t *size);
+void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd,
+               struct kgd_mem *mem);
 int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info,
                                            struct dma_fence **ef);
 int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 2d6b2d77b738..45ccbe9f63ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1857,6 +1857,8 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,

        amdgpu_bo_unreserve(bo);

+       mem->is_mapped_to_kernel = true;
+
        mutex_unlock(&mem->process_info->lock);
        return 0;

@@ -1870,6 +1872,20 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
        return ret;
 }

+void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd, struct kgd_mem *mem)
+{
+       struct amdgpu_bo *bo = mem->bo;
+
+       if (!mem->is_mapped_to_kernel)
+               return;
+
+       amdgpu_bo_reserve(bo, true);
+       amdgpu_bo_kunmap(bo);
+       amdgpu_bo_unpin(bo);
+       amdgpu_bo_unreserve(bo);
+       mem->is_mapped_to_kernel = false;
+}
+
 int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd,
                                              struct kfd_vm_fault_info *mem)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 21ec8a18cad2..f5506b153aed 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -941,6 +941,8 @@ static void kfd_process_device_free_bos(struct kfd_process_device *pdd)
                                peer_pdd->dev->kgd, mem, peer_pdd->drm_priv);
                }

+               amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(pdd->dev->kgd, mem);
+
                amdgpu_amdkfd_gpuvm_free_memory_of_gpu(pdd->dev->kgd, mem,
                                                       pdd->drm_priv, NULL);
                kfd_process_device_remove_obj_handle(pdd, id);

Regards,
Lang

>Regards,
>Christian.


More information about the amd-gfx mailing list