[PATCH] drm/kfd: fix ttm_bo_release warning

Yu, Lang Lang.Yu at amd.com
Fri Sep 24 05:50:06 UTC 2021


[AMD Official Use Only]



>-----Original Message-----
>From: Koenig, Christian <Christian.Koenig at amd.com>
>Sent: Friday, September 24, 2021 1:42 PM
>To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>Cc: Kuehling, Felix <Felix.Kuehling at amd.com>; Huang, Ray
><Ray.Huang at amd.com>
>Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>
>Am 24.09.21 um 07:35 schrieb Yu, Lang:
>> [AMD Official Use Only]
>>> -----Original Message-----
>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>> Sent: Thursday, September 23, 2021 10:52 PM
>>> To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>>> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>; Huang, Ray
>>> <Ray.Huang at amd.com>
>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>
>>> Am 23.09.21 um 16:24 schrieb Yu, Lang:
>>>> [AMD Official Use Only]
>>>>> -----Original Message-----
>>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>>>> Sent: Thursday, September 23, 2021 8:24 PM
>>>>> To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>>>>> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>; Christian K nig
>>>>> <C3B6christian.koenig at amd.com>; Huang, Ray <Ray.Huang at amd.com>
>>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>>>
>>>>> Am 23.09.21 um 14:09 schrieb Yu, Lang:
>>>>>> [AMD Official Use Only]
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>>>>>> Sent: Thursday, September 23, 2021 7:40 PM
>>>>>>> To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>>>>>>> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>; Christian K nig
>>>>>>> <C3B6christian.koenig at amd.com>; Huang, Ray <Ray.Huang at amd.com>
>>>>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 23.09.21 um 11:44 schrieb Lang Yu:
>>>>>>>> If a BO is pinned, unpin it before freeing it.
>>>>>>>>
>>>>>>>> Call Trace:
>>>>>>>> 	ttm_bo_put+0x30/0x50 [ttm]
>>>>>>>> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
>>>>>>>> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
>>>>>>>> 	drm_gem_object_free+0x1d/0x30 [drm]
>>>>>>>> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0
>[amdgpu]
>>>>>>>> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
>>>>>>>> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
>>>>>>>> 	process_one_work+0x220/0x3c0
>>>>>>>> 	worker_thread+0x4d/0x3f0
>>>>>>>> 	kthread+0x114/0x150
>>>>>>>> 	process_one_work+0x3c0/0x3c0
>>>>>>>> 	kthread_park+0x90/0x90
>>>>>>>> 	ret_from_fork+0x22/0x30
>>>>>>>>
>>>>>>>> Signed-off-by: Lang Yu <lang.yu at amd.com>
>>>>>>>> ---
>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>>>>>>>>      1 file changed, 3 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>>> index 2d6b2d77b738..7e693b064072 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>>> @@ -1567,6 +1567,9 @@ int
>>>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>>>>>>>      	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>>>>>>>>      		mem->va + bo_size * (1 + mem->aql_queue));
>>>>>>>>
>>>>>>>> +	if (mem->bo->tbo.pin_count)
>>>>>>>> +		amdgpu_bo_unpin(mem->bo);
>>>>>>>> +
>>>>>>> NAK, using mem->bo->tbo.pin_count like this is illegal.
>>>>>> I didn't  get your point. I referred to function-"void
>>>>>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this.
>>>>> What amdgpu_bo_unpin() does is the following:
>>>>>
>>>>>           ttm_bo_unpin(&bo->tbo);
>>>>>           if (bo->tbo.pin_count)
>>>>>                   return;
>>>>> ....
>>>>>
>>>>> In other words we take further actions based on if the buffer us is
>>>>> still pinned or not after an unpin operation.
>>>>>
>>>>> What you try to do here is unpinning the BO when it is pinned
>>>>> independent if somebody else or our code has pinned it before.
>>>> Hi Christian,
>>>>
>>>> Thanks for your explanation and advice. I got your point.
>>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle.
>>>> I will try to add a flag into struct kgd_mem to indicate which BO is
>>>> pined and should be unpinned. Which will make
>>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks!
>>> That isn't to much better. The real solution would be to unpin them
>>> when the kfd process is destroyed.
>> Yes, will unpin them when the kfd process is destroyed.
>> But we should indicate which BO is pinned and should be unpinned. Right?
>
>Well not with a flag or something like that.
>
>The knowledge which BO is pinned and needs to be unpinned should come from
>the control logic and not be papered over by some general handling.
>That's the background why we have removed the general handling for this from
>TTM in the first place.
>
>In other words when you need to pin a BO because it is kmapped it should be
>unpinned when it is kunmapped and if you don't kunmap at all then there is
>something wrong with the code structure from a higher level point of view.

Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a kmap,
but without a kunmap when the kfd process is destroyed. The flag actually indicates kmap/kunmap.

Regards,
Lang

>Regards,
>Christian.
>
>>
>> Regards,
>> Lang
>>
>>> Regards,
>>> Christian.
>>>
>>>> Regards,
>>>> Lang
>>>>> That can lead to all kind of problems and is clearly illegal.
>>>>>
>>>>>>> Where was the BO pinned in the first place?
>>>>>> I found two places:
>>>>>>
>>>>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
>>>>>> 				      &kaddr);
>>>>>>
>>>>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
>>>>>> 				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);
>>>>> Well then you need to figure out where that memory is freed again
>>>>> and place the unpin appropriately.
>>>>>
>>>>> General rule of thumb is that calls to
>>>>> amdgpu_bo_pin/amdgpu_bo_unpin should be balanced.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Regards,
>>>>>> Lang
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>>      	ret = unreserve_bo_and_vms(&ctx, false, false);
>>>>>>>>
>>>>>>>>      	/* Remove from VM internal data structures */


More information about the amd-gfx mailing list