[PATCH] drm/amdgpu: Fix the dead lock issue.
Deng, Emily
Emily.Deng at amd.com
Tue Sep 11 03:23:26 UTC 2018
>-----Original Message-----
>From: Zhou, David(ChunMing)
>Sent: Tuesday, September 11, 2018 11:03 AM
>To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix the dead lock issue.
>
>
>
>On 2018年09月11日 10:51, Emily Deng wrote:
>> It will ramdomly have the dead lock issue when test TDR:
>> 1. amdgpu_device_handle_vram_lost gets the lock shadow_list_lock 2.
>> amdgpu_bo_create locked the bo's resv lock 3. amdgpu_bo_create_shadow
>> is waiting for the shadow_list_lock 4.
>> amdgpu_device_recover_vram_from_shadow is waiting for the bo's resv
>> lock.
>>
>> v2:
>> Make a local copy of the list
>>
>> Signed-off-by: Emily Deng <Emily.Deng at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21
>++++++++++++++++++++-
>> 1 file changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 2a21267..8c81404 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3105,6 +3105,9 @@ static int
>amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)
>> long r = 1;
>> int i = 0;
>> long tmo;
>> + struct list_head local_shadow_list;
>> +
>> + INIT_LIST_HEAD(&local_shadow_list);
>>
>> if (amdgpu_sriov_runtime(adev))
>> tmo = msecs_to_jiffies(8000);
>> @@ -3112,8 +3115,19 @@ static int
>amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)
>> tmo = msecs_to_jiffies(100);
>>
>> DRM_INFO("recover vram bo from shadow start\n");
>> +
>> + mutex_lock(&adev->shadow_list_lock);
>> + list_splice_init(&adev->shadow_list, &local_shadow_list);
>> + mutex_unlock(&adev->shadow_list_lock);
>> +
>> +
>> mutex_lock(&adev->shadow_list_lock);
>local_shadow_list is a local variable, I think it doesn't need lock at all, no one
>change it. Otherwise looks good to me.
The bo->shadow_list which now is in local_shadow_list maybe destroy in case that it already in amdgpu_bo_destroy, then it will
change local_shadow_list, so need lock the shadow_list_lock.
Best wishes
Emily Deng
>Thanks,
>David Zhou
>> - list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) {
>> + list_for_each_entry_safe(bo, tmp, &local_shadow_list, shadow_list) {
>> + mutex_unlock(&adev->shadow_list_lock);
>> +
>> + if (!bo)
>> + continue;
>> +
>> next = NULL;
>> amdgpu_device_recover_vram_from_shadow(adev, ring, bo,
>&next);
>> if (fence) {
>> @@ -3132,9 +3146,14 @@ static int
>> amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)
>>
>> dma_fence_put(fence);
>> fence = next;
>> + mutex_lock(&adev->shadow_list_lock);
>> }
>> mutex_unlock(&adev->shadow_list_lock);
>>
>> + mutex_lock(&adev->shadow_list_lock);
>> + list_splice_init(&local_shadow_list, &adev->shadow_list);
>> + mutex_unlock(&adev->shadow_list_lock);
>> +
>> if (fence) {
>> r = dma_fence_wait_timeout(fence, false, tmo);
>> if (r == 0)
More information about the amd-gfx
mailing list