[PATCH] drm/amdgpu: disable job timeout on GPU reset disabled

Christian König ckoenig.leichtzumerken at gmail.com
Mon Mar 19 09:42:19 UTC 2018


Am 19.03.2018 um 07:08 schrieb Evan Quan:
> Since under some heavy computing environment(dgemm test), it takes
> the asic over 10+ seconds to finish the dispatched single job
> which will trigger the timeout. It's quite confusing although it
> does not seem to bring any real problems.
> As a quick workround, we choose to disable timeout when GPU reset
> is disabled.

NAK, I enabled those warning intentionally even when the GPU recovery is 
disabled to have a hint in the logs what goes wrong.

Please only increase the timeout for the compute queue and/or add a 
separate timeout for them.

Regards,
Christian.


>
> Change-Id: I3a95d856ba4993094dc7b6269649e470c5b053d2
> Signed-off-by: Evan Quan <evan.quan at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 8bd9c3f..9d6a775 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -861,6 +861,13 @@ static void amdgpu_device_check_arguments(struct amdgpu_device *adev)
>   		amdgpu_lockup_timeout = 10000;
>   	}
>   
> +	/*
> +	 * Disable timeout when GPU reset is disabled to avoid confusing
> +	 * timeout messages in the kernel log.
> +	 */
> +	if (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)
> +		amdgpu_lockup_timeout = INT_MAX;
> +
>   	adev->firmware.load_type = amdgpu_ucode_get_load_type(adev, amdgpu_fw_load_type);
>   }
>   



More information about the amd-gfx mailing list