[PATCH] drm/kfd: fix ttm_bo_release warning

Christian König christian.koenig at amd.com
Thu Sep 23 12:23:55 UTC 2021


Am 23.09.21 um 14:09 schrieb Yu, Lang:
> [AMD Official Use Only]
>
>
>
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig at amd.com>
>> Sent: Thursday, September 23, 2021 7:40 PM
>> To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>; Christian K nig
>> <C3B6christian.koenig at amd.com>; Huang, Ray <Ray.Huang at amd.com>
>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning
>>
>>
>>
>> Am 23.09.21 um 11:44 schrieb Lang Yu:
>>> If a BO is pinned, unpin it before freeing it.
>>>
>>> Call Trace:
>>> 	ttm_bo_put+0x30/0x50 [ttm]
>>> 	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
>>> 	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
>>> 	drm_gem_object_free+0x1d/0x30 [drm]
>>> 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
>>> 	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
>>> 	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
>>> 	process_one_work+0x220/0x3c0
>>> 	worker_thread+0x4d/0x3f0
>>> 	kthread+0x114/0x150
>>> 	process_one_work+0x3c0/0x3c0
>>> 	kthread_park+0x90/0x90
>>> 	ret_from_fork+0x22/0x30
>>>
>>> Signed-off-by: Lang Yu <lang.yu at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> index 2d6b2d77b738..7e693b064072 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> @@ -1567,6 +1567,9 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>>>    	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
>>>    		mem->va + bo_size * (1 + mem->aql_queue));
>>>
>>> +	if (mem->bo->tbo.pin_count)
>>> +		amdgpu_bo_unpin(mem->bo);
>>> +
>> NAK, using mem->bo->tbo.pin_count like this is illegal.
> I didn't  get your point. I referred to function-"void amdgpu_bo_unpin(struct amdgpu_bo *bo)",
> which uses it like this.

What amdgpu_bo_unpin() does is the following:

         ttm_bo_unpin(&bo->tbo);
         if (bo->tbo.pin_count)
                 return;
....

In other words we take further actions based on if the buffer us is 
still pinned or not after an unpin operation.

What you try to do here is unpinning the BO when it is pinned 
independent if somebody else or our code has pinned it before.

That can lead to all kind of problems and is clearly illegal.

>> Where was the BO pinned in the first place?
> I found two places:
>
> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
> 				      &kaddr);
>
> 	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
> 				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);

Well then you need to figure out where that memory is freed again and 
place the unpin appropriately.

General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin 
should be balanced.

Regards,
Christian.

> Regards,
> Lang
>
>> Christian.
>>
>>>    	ret = unreserve_bo_and_vms(&ctx, false, false);
>>>
>>>    	/* Remove from VM internal data structures */



More information about the amd-gfx mailing list