[PATCH] drm/kfd: fix ttm_bo_release warning

Christian König christian.koenig at amd.com
Fri Sep 24 05:42:07 UTC 2021


Am 24.09.21 um 07:35 schrieb Yu, Lang:
> [AMD Official Use Only]
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig at amd.com>
>> Sent: Thursday, September 23, 2021 10:52 PM
>> To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>; Huang, Ray
>> <Ray.Huang at amd.com>
>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>
>> Am 23.09.21 um 16:24 schrieb Yu, Lang:
>>> [AMD Official Use Only]
>>>> -----Original Message-----
>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>>> Sent: Thursday, September 23, 2021 8:24 PM
>>>> To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>>>> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>; Christian K nig
>>>> <C3B6christian.koenig at amd.com>; Huang, Ray <Ray.Huang at amd.com>
>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>>
>>>> Am 23.09.21 um 14:09 schrieb Yu, Lang:
>>>>> [AMD Official Use Only]
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>>>>> Sent: Thursday, September 23, 2021 7:40 PM
>>>>>> To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>>>>>> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>; Christian K nig
>>>>>> <C3B6christian.koenig at amd.com>; Huang, Ray <Ray.Huang at amd.com>
>>>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am 23.09.21 um 11:44 schrieb Lang Yu:
>>>>>>> If a BO is pinned, unpin it before freeing it.
>>>>>>>
>>>>>>> Call Trace:
>>>>>>> 	ttm_bo_put+0x30/0x50 [ttm]
>>>>>>> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
>>>>>>> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
>>>>>>> 	drm_gem_object_free+0x1d/0x30 [drm]
>>>>>>> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
>>>>>>> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
>>>>>>> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
>>>>>>> 	process_one_work+0x220/0x3c0
>>>>>>> 	worker_thread+0x4d/0x3f0
>>>>>>> 	kthread+0x114/0x150
>>>>>>> 	process_one_work+0x3c0/0x3c0
>>>>>>> 	kthread_park+0x90/0x90
>>>>>>> 	ret_from_fork+0x22/0x30
>>>>>>>
>>>>>>> Signed-off-by: Lang Yu <lang.yu at amd.com>
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>>>>>>>      1 file changed, 3 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>> index 2d6b2d77b738..7e693b064072 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>>>> @@ -1567,6 +1567,9 @@ int
>>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>>>>>>      	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>>>>>>>      		mem->va + bo_size * (1 + mem->aql_queue));
>>>>>>>
>>>>>>> +	if (mem->bo->tbo.pin_count)
>>>>>>> +		amdgpu_bo_unpin(mem->bo);
>>>>>>> +
>>>>>> NAK, using mem->bo->tbo.pin_count like this is illegal.
>>>>> I didn't  get your point. I referred to function-"void
>>>>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this.
>>>> What amdgpu_bo_unpin() does is the following:
>>>>
>>>>           ttm_bo_unpin(&bo->tbo);
>>>>           if (bo->tbo.pin_count)
>>>>                   return;
>>>> ....
>>>>
>>>> In other words we take further actions based on if the buffer us is
>>>> still pinned or not after an unpin operation.
>>>>
>>>> What you try to do here is unpinning the BO when it is pinned
>>>> independent if somebody else or our code has pinned it before.
>>> Hi Christian,
>>>
>>> Thanks for your explanation and advice. I got your point.
>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle.
>>> I will try to add a flag into struct kgd_mem to indicate which BO is
>>> pined and should be unpinned. Which will make
>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks!
>> That isn't to much better. The real solution would be to unpin them when the kfd
>> process is destroyed.
> Yes, will unpin them when the kfd process is destroyed.
> But we should indicate which BO is pinned and should be unpinned. Right?

Well not with a flag or something like that.

The knowledge which BO is pinned and needs to be unpinned should come 
from the control logic and not be papered over by some general handling. 
That's the background why we have removed the general handling for this 
from TTM in the first place.

In other words when you need to pin a BO because it is kmapped it should 
be unpinned when it is kunmapped and if you don't kunmap at all then 
there is something wrong with the code structure from a higher level 
point of view.

Regards,
Christian.

>
> Regards,
> Lang
>   
>> Regards,
>> Christian.
>>
>>> Regards,
>>> Lang
>>>> That can lead to all kind of problems and is clearly illegal.
>>>>
>>>>>> Where was the BO pinned in the first place?
>>>>> I found two places:
>>>>>
>>>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
>>>>> 				      &kaddr);
>>>>>
>>>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
>>>>> 				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);
>>>> Well then you need to figure out where that memory is freed again and
>>>> place the unpin appropriately.
>>>>
>>>> General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin
>>>> should be balanced.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Regards,
>>>>> Lang
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>>      	ret = unreserve_bo_and_vms(&ctx, false, false);
>>>>>>>
>>>>>>>      	/* Remove from VM internal data structures */



More information about the amd-gfx mailing list