[PATCH] amdgpu: disable GPU reset if amdgpu.lockup_timeout=0

Christian König ckoenig.leichtzumerken at gmail.com
Tue Dec 12 16:36:57 UTC 2017


Am 12.12.2017 um 15:57 schrieb Marek Olšák:
> On Tue, Dec 12, 2017 at 10:01 AM, Christian König
> <ckoenig.leichtzumerken at gmail.com> wrote:
>> Am 11.12.2017 um 22:29 schrieb Marek Olšák:
>>> From: Marek Olšák <marek.olsak at amd.com>
>>>
>>> Signed-off-by: Marek Olšák <marek.olsak at amd.com>
>>> ---
>>>
>>> Is this really correct? I have no easy way to test it.
>>
>> It's a step in the right direction, but I would rather vote for something
>> else:
>>
>> Instead of disabling the timeout by default we only disable the GPU
>> reset/recovery.
>>
>> The idea is to add a new parameter amdgpu_gpu_recovery which makes
>> amdgpu_gpu_recover only prints out an error and doesn't touch the GPU at all
>> (on bare metal systems).
>>
>> Then we finally set the amdgpu_lockup_timeout to a non zero value by
>> default.
>>
>> Andrey could you take care of this when you have time?
> I don't understand this.
>
> Why can't we keep the previous behavior where amdgpu.lockup_timeout=0
> disabled GPU reset? Why do we have to add another option for the same
> thing?

lockup_timeout=0 never disabled the GPU reset, it just disabled the timeout.

You could still manually trigger a reset and also invalid commands, 
invalid register writes and requests from the SRIOV hypervisor could 
trigger this.

And as Monk explained GPU resets are mandatory for SRIOV, you can't 
disable them at all in this case.

Additional to that we probably want the error message that something 
timed out, but not touching the hardware in any way.

Regards,
Christian.

>
> Marek
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list