[PATCH] drm/amdgpu: Fix the dead lock issue.

Christian König christian.koenig at amd.com
Mon Sep 10 07:23:07 UTC 2018


Am 10.09.2018 um 09:19 schrieb Deng, Emily:
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>> Sent: Monday, September 10, 2018 3:06 PM
>> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.
>>
>> Am 10.09.2018 um 06:07 schrieb Emily Deng:
>>> It will ramdomly have the dead lock issue when test TDR:
>>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2.
>>> amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow
>>> is waiting for the shadow_list_lock 4.
>>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv
>>> lock.
>>>
>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com>
>> Good catch, problem is the fix doesn't work like this because the lock won't be
>> dropped at this point in most cases.
> Sorry, could you explain more, why the lock won't be dropped after calling reservation_object_unlock?

reservation_object_unlock won't be called for most BOs.

E.g. the shadow is used for page directories and all page directories 
use the root reservation object and so have bp->resv set here.

Christian.

>> Instead you need to fix amdgpu_device_recover_vram_from_shadow to make
>> a local copy of the list.
>>
>> You can do this by grabbing a reference to each BO and moving it to a local
>> list.
>>
>> Regards,
>> Christian.
>>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 +++++++-----
>>>    1 file changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index de990bd..c75447d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -557,12 +557,8 @@ static int amdgpu_bo_create_shadow(struct
>> amdgpu_device *adev,
>>>    	bp.resv = bo->tbo.resv;
>>>
>>>    	r = amdgpu_bo_do_create(adev, &bp, &bo->shadow);
>>> -	if (!r) {
>>> +	if (!r)
>>>    		bo->shadow->parent = amdgpu_bo_ref(bo);
>>> -		mutex_lock(&adev->shadow_list_lock);
>>> -		list_add_tail(&bo->shadow_list, &adev->shadow_list);
>>> -		mutex_unlock(&adev->shadow_list_lock);
>>> -	}
>>>
>>>    	return r;
>>>    }
>>> @@ -603,6 +599,12 @@ int amdgpu_bo_create(struct amdgpu_device
>> *adev,
>>>    		if (!bp->resv)
>>>    			reservation_object_unlock((*bo_ptr)->tbo.resv);
>>>
>>> +		if (!r) {
>>> +			mutex_lock(&adev->shadow_list_lock);
>>> +			list_add_tail(&(*bo_ptr)->shadow_list, &adev-
>>> shadow_list);
>>> +			mutex_unlock(&adev->shadow_list_lock);
>>> +		}
>>> +
>>>    		if (r)
>>>    			amdgpu_bo_unref(bo_ptr);
>>>    	}



More information about the amd-gfx mailing list