[PATCH] drm/amdgpu: Fix user queue deadlock by reordering mutex locking

Yadav, Arvind arvyadav at amd.com
Fri May 9 12:18:11 UTC 2025


Reviewed-by: Arvind Yadav <Arvind.Yadav at amd.com>

On 5/9/2025 4:48 PM, Christian König wrote:
> On 5/9/25 11:21, Jesse.Zhang wrote:
>> This resolves a deadlock between user queue management and GPU reset
>> paths by enforcing consistent lock ordering.
>>
>> The deadlock occurred when:
>>
>> 1. Process exit path (amdgpu_userq_mgr_fini) would:
>>     - Take uqm->userq_mutex
>>     - Then try to take adev->userq_mutex for list operations
>>
>> 2. GPU reset path (amdgpu_userq_pre_reset) would:
>>     - Take adev->userq_mutex first (for list traversal)
>>     - Then take uqm->userq_mutex
>>
>> The solution establishes a strict top-down locking order:
>> 1. Always take adev->userq_mutex before any uqm->userq_mutex
>> 2. Maintain this order consistently across all code paths
>>
>> Changes made:
>> - Reordered locking in amdgpu_userq_mgr_fini() to take device lock first
>> - Kept existing proper order in amdgpu_userq_pre_reset()
>> - Simplified the fini flow by removing redundant operations
>>
>> This prevents circular dependencies while maintaining thread safety
>> during both normal operation and GPU reset scenarios.
>>
>> Fixes: e274db8c826 ("drm/amdgpu: store userq_managers in a list in adev")
>>
>> Signed-off-by: Jesse Zhang <Jesse.Zhang at amd.com>
> Reviewed-by: Christian König <christian.koenig at amd.com>
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> index 1c0ddec06280..0f1cb6bc63db 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> @@ -879,22 +879,23 @@ void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
>>   
>>   	cancel_delayed_work_sync(&userq_mgr->resume_work);
>>   
>> +	mutex_lock(&adev->userq_mutex);
>>   	mutex_lock(&userq_mgr->userq_mutex);
>>   	idr_for_each_entry(&userq_mgr->userq_idr, queue, queue_id) {
>>   		amdgpu_userq_wait_for_last_fence(userq_mgr, queue);
>>   		amdgpu_userq_unmap_helper(userq_mgr, queue);
>>   		amdgpu_userq_cleanup(userq_mgr, queue, queue_id);
>>   	}
>> -	mutex_lock(&adev->userq_mutex);
>> +
>>   	list_for_each_entry_safe(uqm, tmp, &adev->userq_mgr_list, list) {
>>   		if (uqm == userq_mgr) {
>>   			list_del(&uqm->list);
>>   			break;
>>   		}
>>   	}
>> -	mutex_unlock(&adev->userq_mutex);
>>   	idr_destroy(&userq_mgr->userq_idr);
>>   	mutex_unlock(&userq_mgr->userq_mutex);
>> +	mutex_unlock(&adev->userq_mutex);
>>   	mutex_destroy(&userq_mgr->userq_mutex);
>>   }
>>   


More information about the amd-gfx mailing list