[PATCH] amdkfd: wq_release signals dma_fence only when available

Chen, Xiaogang xiaogang.chen at amd.com
Thu Dec 12 16:17:37 UTC 2024


On 12/11/2024 11:30 PM, Zhu Lingshan wrote:
> On 12/12/2024 12:19 PM, Felix Kuehling wrote:
>> On 2024-12-11 22:06, Zhu Lingshan wrote:
>>> kfd_process_wq_release() signals eviction fence by
>>> dma_fence_signal() which wanrs if dma_fence
>>> is NULL.
>> That's news to me. Looking at the dma_fence_signal implementation on amd-staging-drm-next, it just silently returns -EINVAL if the fence pointer is NULL. I see the same in Linux 6.12.4:https://elixir.bootlin.com/linux/v6.12.4/source/drivers/dma-buf/dma-fence.c#L467
>>
>> Which branch are you on?
> Linus tree, latest master branch, tag v6.13-rc2
> https://github.com/torvalds/linux/blob/master/drivers/dma-buf/dma-fence.c#L467
>
> which is introduced by
> https://github.com/torvalds/linux/commit/967d226eaae8e40636d257bf8ae55d2c5a912f58

It is new.  I did not see it from AMD kernel either.

Previously I wanted put following dma_fence_put(ef) together with 
dma_fence_signal(ef) :

+	if (ef) {
+		dma_fence_signal(ef);
+		dma_fence_put(ef)
+	}

That seems neater.

Regards
Xiaogang

> Thanks
> Lingshan
>
>> Regards,
>>    Felix
>>
>>> kfd_process->ef is initialized by kfd_process_device_init_vm()
>>> through ioctl. That means the fence is NULL for a new
>>> created kfd_process, and close a kfd_process right
>>> after open it will trigger the warning.
>>>
>>> This commit conditionally signals the eviction fence
>>> in kfd_process_wq_release() only when it is available.
>>>
>>> [  503.660882] WARNING: CPU: 0 PID: 9 at drivers/dma-buf/dma-fence.c:467 dma_fence_signal+0x74/0xa0
>>> [  503.782940] Workqueue: kfd_process_wq kfd_process_wq_release [amdgpu]
>>> [  503.789640] RIP: 0010:dma_fence_signal+0x74/0xa0
>>> [  503.877620] Call Trace:
>>> [  503.880066]  <TASK>
>>> [  503.882168]  ? __warn+0xcd/0x260
>>> [  503.885407]  ? dma_fence_signal+0x74/0xa0
>>> [  503.889416]  ? report_bug+0x288/0x2d0
>>> [  503.893089]  ? handle_bug+0x53/0xa0
>>> [  503.896587]  ? exc_invalid_op+0x14/0x50
>>> [  503.900424]  ? asm_exc_invalid_op+0x16/0x20
>>> [  503.904616]  ? dma_fence_signal+0x74/0xa0
>>> [  503.908626]  kfd_process_wq_release+0x6b/0x370 [amdgpu]
>>> [  503.914081]  process_one_work+0x654/0x10a0
>>> [  503.918186]  worker_thread+0x6c3/0xe70
>>> [  503.921943]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> [  503.926735]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> [  503.931527]  ? __kthread_parkme+0x82/0x140
>>> [  503.935631]  ? __pfx_worker_thread+0x10/0x10
>>> [  503.939904]  kthread+0x2a8/0x380
>>> [  503.943132]  ? __pfx_kthread+0x10/0x10
>>> [  503.946882]  ret_from_fork+0x2d/0x70
>>> [  503.950458]  ? __pfx_kthread+0x10/0x10
>>> [  503.954210]  ret_from_fork_asm+0x1a/0x30
>>> [  503.958142]  </TASK>
>>> [  503.960328] ---[ end trace 0000000000000000 ]---
>>>
>>> Signed-off-by: Zhu Lingshan<lingshan.zhu at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> index 87cd52cf4ee9..47d36f43ee8c 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> @@ -1159,7 +1159,8 @@ static void kfd_process_wq_release(struct work_struct *work)
>>>   	 */
>>>   	synchronize_rcu();
>>>   	ef = rcu_access_pointer(p->ef);
>>> -	dma_fence_signal(ef);
>>> +	if (ef)
>>> +		dma_fence_signal(ef);
>>>   
>>>   	kfd_process_remove_sysfs(p);
>>>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20241212/fe6dc2a2/attachment.htm>


More information about the amd-gfx mailing list