[PATCH] drm/kfd: fix ttm_bo_release warning

Christian König christian.koenig at amd.com
Thu Sep 23 14:52:28 UTC 2021


Am 23.09.21 um 16:24 schrieb Yu, Lang:
> [AMD Official Use Only]
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig at amd.com>
>> Sent: Thursday, September 23, 2021 8:24 PM
>> To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>; Christian K nig
>> <C3B6christian.koenig at amd.com>; Huang, Ray <Ray.Huang at amd.com>
>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>
>> Am 23.09.21 um 14:09 schrieb Yu, Lang:
>>> [AMD Official Use Only]
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>>> Sent: Thursday, September 23, 2021 7:40 PM
>>>> To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>>>> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>; Christian K nig
>>>> <C3B6christian.koenig at amd.com>; Huang, Ray <Ray.Huang at amd.com>
>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>>>
>>>>
>>>>
>>>> Am 23.09.21 um 11:44 schrieb Lang Yu:
>>>>> If a BO is pinned, unpin it before freeing it.
>>>>>
>>>>> Call Trace:
>>>>> 	ttm_bo_put+0x30/0x50 [ttm]
>>>>> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
>>>>> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
>>>>> 	drm_gem_object_free+0x1d/0x30 [drm]
>>>>> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
>>>>> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
>>>>> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
>>>>> 	process_one_work+0x220/0x3c0
>>>>> 	worker_thread+0x4d/0x3f0
>>>>> 	kthread+0x114/0x150
>>>>> 	process_one_work+0x3c0/0x3c0
>>>>> 	kthread_park+0x90/0x90
>>>>> 	ret_from_fork+0x22/0x30
>>>>>
>>>>> Signed-off-by: Lang Yu <lang.yu at amd.com>
>>>>> ---
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>>>>>     1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> index 2d6b2d77b738..7e693b064072 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>>> @@ -1567,6 +1567,9 @@ int
>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>>>>     	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>>>>>     		mem->va + bo_size * (1 + mem->aql_queue));
>>>>>
>>>>> +	if (mem->bo->tbo.pin_count)
>>>>> +		amdgpu_bo_unpin(mem->bo);
>>>>> +
>>>> NAK, using mem->bo->tbo.pin_count like this is illegal.
>>> I didn't  get your point. I referred to function-"void
>>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this.
>> What amdgpu_bo_unpin() does is the following:
>>
>>          ttm_bo_unpin(&bo->tbo);
>>          if (bo->tbo.pin_count)
>>                  return;
>> ....
>>
>> In other words we take further actions based on if the buffer us is still pinned or
>> not after an unpin operation.
>>
>> What you try to do here is unpinning the BO when it is pinned independent if
>> somebody else or our code has pinned it before.
> Hi Christian,
>
> Thanks for your explanation and advice. I got your point.
> Actually, these BOs are allocated and pinned during a kfd process lifecycle.
> I will try to add a flag into struct kgd_mem to indicate which BO is pined
> and should be unpinned. Which will make amdgpu_bo_pin/amdgpu_bo_unpin
> calls balanced. Thanks!

That isn't to much better. The real solution would be to unpin them when 
the kfd process is destroyed.

Regards,
Christian.

>
> Regards,
> Lang
>> That can lead to all kind of problems and is clearly illegal.
>>
>>>> Where was the BO pinned in the first place?
>>> I found two places:
>>>
>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
>>> 				      &kaddr);
>>>
>>> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
>>> 				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);
>> Well then you need to figure out where that memory is freed again and place the
>> unpin appropriately.
>>
>> General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin should
>> be balanced.
>>
>> Regards,
>> Christian.
>>
>>> Regards,
>>> Lang
>>>
>>>> Christian.
>>>>
>>>>>     	ret = unreserve_bo_and_vms(&ctx, false, false);
>>>>>
>>>>>     	/* Remove from VM internal data structures */



More information about the amd-gfx mailing list