[PATCH] drm/amdgpu: Fix the dead lock issue.
Christian König
christian.koenig at amd.com
Mon Sep 10 07:23:07 UTC 2018
Am 10.09.2018 um 09:19 schrieb Deng, Emily:
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>> Sent: Monday, September 10, 2018 3:06 PM
>> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.
>>
>> Am 10.09.2018 um 06:07 schrieb Emily Deng:
>>> It will ramdomly have the dead lock issue when test TDR:
>>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2.
>>> amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow
>>> is waiting for the shadow_list_lock 4.
>>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv
>>> lock.
>>>
>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com>
>> Good catch, problem is the fix doesn't work like this because the lock won't be
>> dropped at this point in most cases.
> Sorry, could you explain more, why the lock won't be dropped after calling reservation_object_unlock?
reservation_object_unlock won't be called for most BOs.
E.g. the shadow is used for page directories and all page directories
use the root reservation object and so have bp->resv set here.
Christian.
>> Instead you need to fix amdgpu_device_recover_vram_from_shadow to make
>> a local copy of the list.
>>
>> You can do this by grabbing a reference to each BO and moving it to a local
>> list.
>>
>> Regards,
>> Christian.
>>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 +++++++-----
>>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index de990bd..c75447d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -557,12 +557,8 @@ static int amdgpu_bo_create_shadow(struct
>> amdgpu_device *adev,
>>> bp.resv = bo->tbo.resv;
>>>
>>> r = amdgpu_bo_do_create(adev, &bp, &bo->shadow);
>>> - if (!r) {
>>> + if (!r)
>>> bo->shadow->parent = amdgpu_bo_ref(bo);
>>> - mutex_lock(&adev->shadow_list_lock);
>>> - list_add_tail(&bo->shadow_list, &adev->shadow_list);
>>> - mutex_unlock(&adev->shadow_list_lock);
>>> - }
>>>
>>> return r;
>>> }
>>> @@ -603,6 +599,12 @@ int amdgpu_bo_create(struct amdgpu_device
>> *adev,
>>> if (!bp->resv)
>>> reservation_object_unlock((*bo_ptr)->tbo.resv);
>>>
>>> + if (!r) {
>>> + mutex_lock(&adev->shadow_list_lock);
>>> + list_add_tail(&(*bo_ptr)->shadow_list, &adev-
>>> shadow_list);
>>> + mutex_unlock(&adev->shadow_list_lock);
>>> + }
>>> +
>>> if (r)
>>> amdgpu_bo_unref(bo_ptr);
>>> }
More information about the amd-gfx
mailing list