[PATCH] amdgpu: disable GPU reset if amdgpu.lockup_timeout=0

Andrey Grodzovsky Andrey.Grodzovsky at amd.com
Tue Dec 12 12:58:04 UTC 2017



On 12/12/2017 04:01 AM, Christian König wrote:
> Am 11.12.2017 um 22:29 schrieb Marek Olšák:
>> From: Marek Olšák <marek.olsak at amd.com>
>>
>> Signed-off-by: Marek Olšák <marek.olsak at amd.com>
>> ---
>>
>> Is this really correct? I have no easy way to test it.
>
> It's a step in the right direction, but I would rather vote for 
> something else:
>
> Instead of disabling the timeout by default we only disable the GPU 
> reset/recovery.
>
> The idea is to add a new parameter amdgpu_gpu_recovery which makes 
> amdgpu_gpu_recover only prints out an error and doesn't touch the GPU 
> at all (on bare metal systems).
>
> Then we finally set the amdgpu_lockup_timeout to a non zero value by 
> default.
>
> Andrey could you take care of this when you have time?
>
> Thanks,
> Christian.

Sure.

Thanks,
Andrey

>
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 8d03baa..56c41cf 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3018,20 +3018,24 @@ static int amdgpu_reset_sriov(struct 
>> amdgpu_device *adev, uint64_t *reset_flags,
>>    *
>>    * Attempt to reset the GPU if it has hung (all asics).
>>    * Returns 0 for success or an error on failure.
>>    */
>>   int amdgpu_gpu_recover(struct amdgpu_device *adev, struct 
>> amdgpu_job *job)
>>   {
>>       struct drm_atomic_state *state = NULL;
>>       uint64_t reset_flags = 0;
>>       int i, r, resched;
>>   +    /* amdgpu.lockup_timeout=0 disables GPU reset. */
>> +    if (amdgpu_lockup_timeout == 0)
>> +        return 0;
>> +
>>       if (!amdgpu_check_soft_reset(adev)) {
>>           DRM_INFO("No hardware hang detected. Did some blocks 
>> stall?\n");
>>           return 0;
>>       }
>>         dev_info(adev->dev, "GPU reset begin!\n");
>>         mutex_lock(&adev->lock_reset);
>>       atomic_inc(&adev->gpu_reset_counter);
>>       adev->in_gpu_reset = 1;
>



More information about the amd-gfx mailing list