[PATCH] drm/amdgpu: no job timeout setting on compute queues

Quan, Evan Evan.Quan at amd.com
Tue Mar 27 01:33:51 UTC 2018


Hi Monk,

That’s a fix reached after a long discussion(I believe you were in that mail thread).
50s is for the dgemm test. We are not sure whether it’s enough for other compute use case.
If virtualization still needs these messages, I believe we can list that as an exception

-                                  (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE) ?
+                                  /* for non-sriov case, no timeout enforce on compute ring */
+                                  ((ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE) && !amdgpu_sriov_vf(ring->adev)) ?

Regards,
Evan
From: Deucher, Alexander
Sent: Monday, March 26, 2018 11:20 PM
To: Liu, Monk <Monk.Liu at amd.com>; Quan, Evan <Evan.Quan at amd.com>; Michel Dänzer <michel at daenzer.net>
Cc: amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues


That's fine with me too.  Or make them DRM_INFO.



Alex

________________________________
From: Liu, Monk
Sent: Monday, March 26, 2018 8:55:51 AM
To: Quan, Evan; Deucher, Alexander; Michel Dänzer
Cc: amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Subject: RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues


Besides, if some compute shader takes time more them 50 seconds, you can just set lockuptime out to 50s

Why change the logic in kmd side ?



I don’t think it’s a good idea to disable the time out message for compute ring, we have virtualization end-user

Still want those message printed out



Can you do this way ?

In amdgpu_job_timeout, you can use DRM_WARN to replace DRM_ERROR for the job belongs to CPC engine ?



/Monk



From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf Of Quan, Evan
Sent: 2018年3月19日 9:47
To: Deucher, Alexander <Alexander.Deucher at amd.com<mailto:Alexander.Deucher at amd.com>>; Michel Dänzer <michel at daenzer.net<mailto:michel at daenzer.net>>
Cc: amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Subject: RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues



That’s fine for me. Will update the patch accordingly.



Regards,

Evan

From: Deucher, Alexander
Sent: Saturday, March 17, 2018 1:17 AM
To: Michel Dänzer <michel at daenzer.net<mailto:michel at daenzer.net>>; Quan, Evan <Evan.Quan at amd.com<mailto:Evan.Quan at amd.com>>
Cc: amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues



How about something like this:

/* Disable timeout when GPU reset is disabled to avoid confusing timeout messages in the kernel log */

If (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)

    amdgpu_lockup_timeout = MAX_SCHEDULE_TIMEOUT;



Alex

________________________________

From: Michel Dänzer <michel at daenzer.net<mailto:michel at daenzer.net>>
Sent: Friday, March 16, 2018 12:25:03 PM
To: Deucher, Alexander; Quan, Evan
Cc: amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues



On 2018-03-16 05:14 PM, Deucher, Alexander wrote:
> Since GPU reset is not enabled yet anyway, a timeout will just print a
> message, can we just change amdgpu_lockup_timeout to
> MAX_SCHEDULE_TIMEOUT until we enable GPU reset?

Wouldn't that be rather surprising for somebody who sets
amdgpu.gpu_recovery=1 ?


--
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180327/815bd805/attachment.html>


More information about the amd-gfx mailing list