[PATCH] drm/amdgpu: extend the default timeout for kernel compute queues
Alex Deucher
alexdeucher at gmail.com
Thu Apr 20 12:56:56 UTC 2023
On Thu, Apr 20, 2023 at 5:19 AM Feifei Xu <Feifei.Xu at amd.com> wrote:
>
> Extend to 120s. The default timeout value should also extend if compute
> shader execution time extended. Otherwise some stress test will trigger
> compute ring timeout in software.
I think that's probably too long. 2 minutes is a long time to have a
hung system. I think we should rework the tests or use ROCm for long
running test cases.
Alex
>
> Signed-off-by: Feifei Xu <Feifei.Xu at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e536886f6d42..1f98b4b0a549 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3475,7 +3475,7 @@ static int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>
> /*
> * By default timeout for non compute jobs is 10000
> - * and 60000 for compute jobs.
> + * and 120000 for compute jobs.
> * In SR-IOV or passthrough mode, timeout for compute
> * jobs are 60000 by default.
> */
> @@ -3485,7 +3485,7 @@ static int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
> adev->compute_timeout = amdgpu_sriov_is_pp_one_vf(adev) ?
> msecs_to_jiffies(60000) : msecs_to_jiffies(10000);
> else
> - adev->compute_timeout = msecs_to_jiffies(60000);
> + adev->compute_timeout = msecs_to_jiffies(120000);
>
> if (strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH)) {
> while ((timeout_setting = strsep(&input, ",")) &&
> --
> 2.34.1
>
More information about the amd-gfx
mailing list