[PATCH] drm/amdgpu: disable job timeout on GPU reset disabled
Christian König
ckoenig.leichtzumerken at gmail.com
Mon Mar 19 09:42:19 UTC 2018
Am 19.03.2018 um 07:08 schrieb Evan Quan:
> Since under some heavy computing environment(dgemm test), it takes
> the asic over 10+ seconds to finish the dispatched single job
> which will trigger the timeout. It's quite confusing although it
> does not seem to bring any real problems.
> As a quick workround, we choose to disable timeout when GPU reset
> is disabled.
NAK, I enabled those warning intentionally even when the GPU recovery is
disabled to have a hint in the logs what goes wrong.
Please only increase the timeout for the compute queue and/or add a
separate timeout for them.
Regards,
Christian.
>
> Change-Id: I3a95d856ba4993094dc7b6269649e470c5b053d2
> Signed-off-by: Evan Quan <evan.quan at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 8bd9c3f..9d6a775 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -861,6 +861,13 @@ static void amdgpu_device_check_arguments(struct amdgpu_device *adev)
> amdgpu_lockup_timeout = 10000;
> }
>
> + /*
> + * Disable timeout when GPU reset is disabled to avoid confusing
> + * timeout messages in the kernel log.
> + */
> + if (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)
> + amdgpu_lockup_timeout = INT_MAX;
> +
> adev->firmware.load_type = amdgpu_ucode_get_load_type(adev, amdgpu_fw_load_type);
> }
>
More information about the amd-gfx
mailing list