[PATCH] drm/amdgpu: Fix the dead lock issue.

Deng, Emily Emily.Deng at amd.com
Mon Sep 10 07:57:40 UTC 2018


>-----Original Message-----
>From: Koenig, Christian
>Sent: Monday, September 10, 2018 3:23 PM
>To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.
>
>Am 10.09.2018 um 09:19 schrieb Deng, Emily:
>>> -----Original Message-----
>>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>>> Sent: Monday, September 10, 2018 3:06 PM
>>> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.
>>>
>>> Am 10.09.2018 um 06:07 schrieb Emily Deng:
>>>> It will ramdomly have the dead lock issue when test TDR:
>>>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2.
>>>> amdgpu_bo_create locked the bo's resv lock 3.
>>>> amdgpu_bo_create_shadow is waiting for the shadow_list_lock 4.
>>>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv
>>>> lock.
>>>>
>>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com>
>>> Good catch, problem is the fix doesn't work like this because the
>>> lock won't be dropped at this point in most cases.
>> Sorry, could you explain more, why the lock won't be dropped after calling
>reservation_object_unlock?
>
>reservation_object_unlock won't be called for most BOs.
>
>E.g. the shadow is used for page directories and all page directories use the
>root reservation object and so have bp->resv set here.
Thanks, will modify as your suggestion.

Best wishes
Emily Deng
>Christian.
>
>>> Instead you need to fix amdgpu_device_recover_vram_from_shadow to
>>> make a local copy of the list.
>>>
>>> You can do this by grabbing a reference to each BO and moving it to a
>>> local list.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 +++++++-----
>>>>    1 file changed, 7 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> index de990bd..c75447d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> @@ -557,12 +557,8 @@ static int amdgpu_bo_create_shadow(struct
>>> amdgpu_device *adev,
>>>>    	bp.resv = bo->tbo.resv;
>>>>
>>>>    	r = amdgpu_bo_do_create(adev, &bp, &bo->shadow);
>>>> -	if (!r) {
>>>> +	if (!r)
>>>>    		bo->shadow->parent = amdgpu_bo_ref(bo);
>>>> -		mutex_lock(&adev->shadow_list_lock);
>>>> -		list_add_tail(&bo->shadow_list, &adev->shadow_list);
>>>> -		mutex_unlock(&adev->shadow_list_lock);
>>>> -	}
>>>>
>>>>    	return r;
>>>>    }
>>>> @@ -603,6 +599,12 @@ int amdgpu_bo_create(struct amdgpu_device
>>> *adev,
>>>>    		if (!bp->resv)
>>>>    			reservation_object_unlock((*bo_ptr)->tbo.resv);
>>>>
>>>> +		if (!r) {
>>>> +			mutex_lock(&adev->shadow_list_lock);
>>>> +			list_add_tail(&(*bo_ptr)->shadow_list, &adev-
>>>> shadow_list);
>>>> +			mutex_unlock(&adev->shadow_list_lock);
>>>> +		}
>>>> +
>>>>    		if (r)
>>>>    			amdgpu_bo_unref(bo_ptr);
>>>>    	}



More information about the amd-gfx mailing list