[PATCH] amdgpu: disable GPU reset if amdgpu.lockup_timeout=0
Christian König
ckoenig.leichtzumerken at gmail.com
Tue Dec 12 09:01:15 UTC 2017
Am 11.12.2017 um 22:29 schrieb Marek Olšák:
> From: Marek Olšák <marek.olsak at amd.com>
>
> Signed-off-by: Marek Olšák <marek.olsak at amd.com>
> ---
>
> Is this really correct? I have no easy way to test it.
It's a step in the right direction, but I would rather vote for
something else:
Instead of disabling the timeout by default we only disable the GPU
reset/recovery.
The idea is to add a new parameter amdgpu_gpu_recovery which makes
amdgpu_gpu_recover only prints out an error and doesn't touch the GPU at
all (on bare metal systems).
Then we finally set the amdgpu_lockup_timeout to a non zero value by
default.
Andrey could you take care of this when you have time?
Thanks,
Christian.
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 8d03baa..56c41cf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3018,20 +3018,24 @@ static int amdgpu_reset_sriov(struct amdgpu_device *adev, uint64_t *reset_flags,
> *
> * Attempt to reset the GPU if it has hung (all asics).
> * Returns 0 for success or an error on failure.
> */
> int amdgpu_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job *job)
> {
> struct drm_atomic_state *state = NULL;
> uint64_t reset_flags = 0;
> int i, r, resched;
>
> + /* amdgpu.lockup_timeout=0 disables GPU reset. */
> + if (amdgpu_lockup_timeout == 0)
> + return 0;
> +
> if (!amdgpu_check_soft_reset(adev)) {
> DRM_INFO("No hardware hang detected. Did some blocks stall?\n");
> return 0;
> }
>
> dev_info(adev->dev, "GPU reset begin!\n");
>
> mutex_lock(&adev->lock_reset);
> atomic_inc(&adev->gpu_reset_counter);
> adev->in_gpu_reset = 1;
More information about the amd-gfx
mailing list