[PATCH] drm/amdgpu: extend the default timeout for kernel compute queues

Alex Deucher alexdeucher at gmail.com
Thu Apr 20 12:56:56 UTC 2023


On Thu, Apr 20, 2023 at 5:19 AM Feifei Xu <Feifei.Xu at amd.com> wrote:
>
> Extend to 120s. The default timeout value should also extend if compute
> shader execution time extended. Otherwise some stress test will trigger
> compute ring timeout in software.

I think that's probably too long.  2 minutes is a long time to have a
hung system.  I think we should rework the tests or use ROCm for long
running test cases.

Alex

>
> Signed-off-by: Feifei Xu <Feifei.Xu at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e536886f6d42..1f98b4b0a549 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3475,7 +3475,7 @@ static int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>
>         /*
>          * By default timeout for non compute jobs is 10000
> -        * and 60000 for compute jobs.
> +        * and 120000 for compute jobs.
>          * In SR-IOV or passthrough mode, timeout for compute
>          * jobs are 60000 by default.
>          */
> @@ -3485,7 +3485,7 @@ static int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>                 adev->compute_timeout = amdgpu_sriov_is_pp_one_vf(adev) ?
>                                         msecs_to_jiffies(60000) : msecs_to_jiffies(10000);
>         else
> -               adev->compute_timeout =  msecs_to_jiffies(60000);
> +               adev->compute_timeout =  msecs_to_jiffies(120000);
>
>         if (strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH)) {
>                 while ((timeout_setting = strsep(&input, ",")) &&
> --
> 2.34.1
>


More information about the amd-gfx mailing list