[PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list

Christian König ckoenig.leichtzumerken at gmail.com
Tue Apr 2 07:01:24 UTC 2019


Yeah, agree that is much closer to a correct solution.

But even better would be to correctly update the timeout as well, e.g:

tmo = dma_fence_wait_timeout(fence, false, tmo);
dma_fence_put(fence);
fence = next;
if (tmo == 0)
     r = -ETIMEDOUT;
     break
} else if (tmo < 0) {
     r = tmo;
     break;
}

That we restart the timeout for each wait looks like a rather 
problematic bug to me as well.

Christian.

Am 02.04.19 um 06:03 schrieb Deng, Emily:
> Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)".
> 	r = dma_fence_wait_timeout(f, false, timeout);
> 	if (r == 0) {
> 		r = -ETIMEDOUT;
> 		break;
> 	} else if (r < 0) {
> 		break;
> 	}
>
> Best wishes
> Emily Deng
>
>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of wentalou
>> Sent: Monday, April 1, 2019 4:59 PM
>> To: amd-gfx at lists.freedesktop.org
>> Cc: Lou, Wentao <Wentao.Lou at amd.com>
>> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if
>> only one node in shadow_list
>>
>> amdgpu_bo_restore_shadow would assign zero to r if succeeded.
>> r would remain zero if there is only one node in shadow_list.
>> current code would always return failure when r <= 0.
>>
>> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5
>> Signed-off-by: Wentao Lou <Wentao.Lou at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index c4c61e9..5cf21a4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct
>> amdgpu_device *adev)
>> 	struct dma_fence *fence = NULL, *next = NULL;
>> 	struct amdgpu_bo *shadow;
>> 	long r = 1, tmo;
>> +	bool single_shadow = false;
>>
>> 	if (amdgpu_sriov_runtime(adev))
>> 		tmo = msecs_to_jiffies(8000);
>> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct
>> amdgpu_device *adev)
>> 			r = dma_fence_wait_timeout(fence, false, tmo);
>> 			dma_fence_put(fence);
>> 			fence = next;
>> +			single_shadow = false;
>> 			if (r <= 0)
>> 				break;
>> 		} else {
>> 			fence = next;
>> +			single_shadow = true;
>> 		}
>> 	}
>> 	mutex_unlock(&adev->shadow_list_lock);
>> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct
>> amdgpu_device *adev)
>> 		tmo = dma_fence_wait_timeout(fence, false, tmo);
>> 	dma_fence_put(fence);
>>
>> -	if (r <= 0 || tmo <= 0) {
>> +	/* r would be zero even if amdgpu_bo_restore_shadow succeeded when
>> single shadow in list */
>> +	if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) {
>> 		DRM_ERROR("recover vram bo from shadow failed\n");
>> 		return -EIO;
>> 	}
>> --
>> 2.7.4
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list