[PATCH] drm/amdgpu: Fix user queue deadlock by reordering mutex locking
Jesse.Zhang
Jesse.Zhang at amd.com
Fri May 9 09:21:13 UTC 2025
This resolves a deadlock between user queue management and GPU reset
paths by enforcing consistent lock ordering.
The deadlock occurred when:
1. Process exit path (amdgpu_userq_mgr_fini) would:
- Take uqm->userq_mutex
- Then try to take adev->userq_mutex for list operations
2. GPU reset path (amdgpu_userq_pre_reset) would:
- Take adev->userq_mutex first (for list traversal)
- Then take uqm->userq_mutex
The solution establishes a strict top-down locking order:
1. Always take adev->userq_mutex before any uqm->userq_mutex
2. Maintain this order consistently across all code paths
Changes made:
- Reordered locking in amdgpu_userq_mgr_fini() to take device lock first
- Kept existing proper order in amdgpu_userq_pre_reset()
- Simplified the fini flow by removing redundant operations
This prevents circular dependencies while maintaining thread safety
during both normal operation and GPU reset scenarios.
Fixes: e274db8c826 ("drm/amdgpu: store userq_managers in a list in adev")
Signed-off-by: Jesse Zhang <Jesse.Zhang at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
index 1c0ddec06280..0f1cb6bc63db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
@@ -879,22 +879,23 @@ void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
cancel_delayed_work_sync(&userq_mgr->resume_work);
+ mutex_lock(&adev->userq_mutex);
mutex_lock(&userq_mgr->userq_mutex);
idr_for_each_entry(&userq_mgr->userq_idr, queue, queue_id) {
amdgpu_userq_wait_for_last_fence(userq_mgr, queue);
amdgpu_userq_unmap_helper(userq_mgr, queue);
amdgpu_userq_cleanup(userq_mgr, queue, queue_id);
}
- mutex_lock(&adev->userq_mutex);
+
list_for_each_entry_safe(uqm, tmp, &adev->userq_mgr_list, list) {
if (uqm == userq_mgr) {
list_del(&uqm->list);
break;
}
}
- mutex_unlock(&adev->userq_mutex);
idr_destroy(&userq_mgr->userq_idr);
mutex_unlock(&userq_mgr->userq_mutex);
+ mutex_unlock(&adev->userq_mutex);
mutex_destroy(&userq_mgr->userq_mutex);
}
--
2.49.0
More information about the amd-gfx
mailing list