[PATCH] amdgpu: disable GPU reset if amdgpu.lockup_timeout=0
Andrey Grodzovsky
Andrey.Grodzovsky at amd.com
Tue Dec 12 12:58:04 UTC 2017
On 12/12/2017 04:01 AM, Christian König wrote:
> Am 11.12.2017 um 22:29 schrieb Marek Olšák:
>> From: Marek Olšák <marek.olsak at amd.com>
>>
>> Signed-off-by: Marek Olšák <marek.olsak at amd.com>
>> ---
>>
>> Is this really correct? I have no easy way to test it.
>
> It's a step in the right direction, but I would rather vote for
> something else:
>
> Instead of disabling the timeout by default we only disable the GPU
> reset/recovery.
>
> The idea is to add a new parameter amdgpu_gpu_recovery which makes
> amdgpu_gpu_recover only prints out an error and doesn't touch the GPU
> at all (on bare metal systems).
>
> Then we finally set the amdgpu_lockup_timeout to a non zero value by
> default.
>
> Andrey could you take care of this when you have time?
>
> Thanks,
> Christian.
Sure.
Thanks,
Andrey
>
>>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 8d03baa..56c41cf 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3018,20 +3018,24 @@ static int amdgpu_reset_sriov(struct
>> amdgpu_device *adev, uint64_t *reset_flags,
>> *
>> * Attempt to reset the GPU if it has hung (all asics).
>> * Returns 0 for success or an error on failure.
>> */
>> int amdgpu_gpu_recover(struct amdgpu_device *adev, struct
>> amdgpu_job *job)
>> {
>> struct drm_atomic_state *state = NULL;
>> uint64_t reset_flags = 0;
>> int i, r, resched;
>> + /* amdgpu.lockup_timeout=0 disables GPU reset. */
>> + if (amdgpu_lockup_timeout == 0)
>> + return 0;
>> +
>> if (!amdgpu_check_soft_reset(adev)) {
>> DRM_INFO("No hardware hang detected. Did some blocks
>> stall?\n");
>> return 0;
>> }
>> dev_info(adev->dev, "GPU reset begin!\n");
>> mutex_lock(&adev->lock_reset);
>> atomic_inc(&adev->gpu_reset_counter);
>> adev->in_gpu_reset = 1;
>
More information about the amd-gfx
mailing list