[PATCH] drm/amdgpu: correctly report gpu recover status

Christian König christian.koenig at amd.com
Wed Jan 1 11:17:30 UTC 2020


Hi Evan,

> But still what I care more(which is also the easiest way to me) is the correct return value of the API.
Well exactly that's the point ther return value is not correct for the API.

For example when the GPU reset function would return -EFAULT your 
program which reads the debugfs file would crash with a segmentation 
fault. That is not correct behavior.

In other words the result of the GPU reset can't be used as result of 
the debugfs read.

Regards,
Christian.

Am 19.12.19 um 02:48 schrieb Quan, Evan:
> Hi Christian,
>
> Here is some background for this change:
> I'm debugging a random failure issue on baco reset.
> I used a while loop to run the continuous baco reset tests and hope it can exit immediately on failure occurred.
> However, due to wrong return value, it did not. And as you can image, the failure scene was ruined.
>
> I can add this "seq_printf(m, "gpu recover %d\n", r);".
> But still what I care more(which is also the easiest way to me) is the correct return value of the API.
>
> Regards,
> Evan
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>> Sent: Wednesday, December 18, 2019 5:57 PM
>> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: correctly report gpu recover status
>>
>> Am 18.12.19 um 04:25 schrieb Evan Quan:
>>> Knowing whether gpu recovery was performed successfully or not is
>>> important for our BACO development.
>>>
>>> Change-Id: I0e3ca4dcb65a053eb26bc55ad7431e4a42e160de
>>> Signed-off-by: Evan Quan <evan.quan at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +---
>>>    1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> index e9efee04ca23..5dff5c0dd882 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> @@ -743,9 +743,7 @@ static int amdgpu_debugfs_gpu_recover(struct
>> seq_file *m, void *data)
>>>    	struct amdgpu_device *adev = dev->dev_private;
>>>
>>>    	seq_printf(m, "gpu recover\n");
>>> -	amdgpu_device_gpu_recover(adev, NULL);
>>> -
>>> -	return 0;
>>> +	return amdgpu_device_gpu_recover(adev, NULL);
>> NAK, what we could do here is the following:
>>
>> r = amdgpu_device_gpu_recover(....);
>> seq_printf(m, "gpu recover %d\n", r);
>>
>> But returning the error code from the GPU recovery to userspace doesn't make
>> to much sense.
>>
>> Christian.
>>
>>>    }
>>>
>>>    static const struct drm_info_list amdgpu_debugfs_fence_list[] = {



More information about the amd-gfx mailing list