[PATCH] drm/amdgpu: extend the default timeout for kernel compute queues

Christian König ckoenig.leichtzumerken at gmail.com
Thu Apr 20 15:20:45 UTC 2023


Am 20.04.23 um 14:56 schrieb Alex Deucher:
> On Thu, Apr 20, 2023 at 5:19 AM Feifei Xu <Feifei.Xu at amd.com> wrote:
>> Extend to 120s. The default timeout value should also extend if compute
>> shader execution time extended. Otherwise some stress test will trigger
>> compute ring timeout in software.
> I think that's probably too long.  2 minutes is a long time to have a
> hung system.  I think we should rework the tests or use ROCm for long
> running test cases.

Yeah, agree. This came up multiple times now and even 60000 is way to 
much actually.

You need to keep in mind that this has dependencies and it essentially 
means that the system sometimes needs 60sec in case of a bug to become 
responsible again.

So clearly a NAK for this.

Christian.

>
> Alex
>
>> Signed-off-by: Feifei Xu <Feifei.Xu at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index e536886f6d42..1f98b4b0a549 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3475,7 +3475,7 @@ static int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>>
>>          /*
>>           * By default timeout for non compute jobs is 10000
>> -        * and 60000 for compute jobs.
>> +        * and 120000 for compute jobs.
>>           * In SR-IOV or passthrough mode, timeout for compute
>>           * jobs are 60000 by default.
>>           */
>> @@ -3485,7 +3485,7 @@ static int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
>>                  adev->compute_timeout = amdgpu_sriov_is_pp_one_vf(adev) ?
>>                                          msecs_to_jiffies(60000) : msecs_to_jiffies(10000);
>>          else
>> -               adev->compute_timeout =  msecs_to_jiffies(60000);
>> +               adev->compute_timeout =  msecs_to_jiffies(120000);
>>
>>          if (strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH)) {
>>                  while ((timeout_setting = strsep(&input, ",")) &&
>> --
>> 2.34.1
>>



More information about the amd-gfx mailing list