[PATCH] drm/amdgpu: support gpu recovery tests on compute rings

Quan, Evan Evan.Quan at amd.com
Sun Apr 28 05:37:27 UTC 2019


How about amdgpu.lockup_timeout=non-compute-jobs[, gfx, sdma, decode, encode][: compute-jobs] ?
This will not break backward compatibility.

And I’m not sure how to map “decode” and “encode” to the uvd/vce/vcn rings.
Since there are many rings related with these IPs(uvd, uvd_enc, vce, vcn_dec, vcn_enc, vcn_jpeg).
Maybe we should use IP name(uvd, vce or vcn) instead of “decode/encode”?

Regards,
Evan
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Deucher, Alexander
Sent: 2019年4月26日 22:24
To: Michel Dänzer <michel at daenzer.net>; Quan, Evan <Evan.Quan at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>
Cc: Xu, Feifei <Feifei.Xu at amd.com>; Cui, Flora <Flora.Cui at amd.com>; amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: support gpu recovery tests on compute rings

How about an interface to change the timeout on a per engine (gfx, compute, dma, etc.) basis?
amdgpu.lockup_timeout=<global>,<gfx>,<compute>,<sdma>,<decode>,<encode>]
if only one parameter is given, we change it globably.  If more are given, we override the global one.  Could also do a sysfs interface to change it on the fly.

Alex
________________________________
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org<mailto:amd-gfx-bounces at lists.freedesktop.org>> on behalf of Michel Dänzer <michel at daenzer.net<mailto:michel at daenzer.net>>
Sent: Friday, April 26, 2019 4:35 AM
To: Quan, Evan; Koenig, Christian
Cc: Xu, Feifei; Cui, Flora; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: support gpu recovery tests on compute rings

On 2019-04-26 10:20 a.m., Quan, Evan wrote:
> My concern is there is already one module parameter "lockup_timeout".
> parm:           lockup_timeout:GPU lockup timeout in ms > 0 (default 10000) (int)
>
> Adding one more "timeout" seems redundant.
> And that will makes the description of "lockup_timeout"(seems working for all jobs) does not match its real effect(affect only non-compute jobs).
>
> A better way is to rename "lockup_timeout" to "non-compute lockup_timeout". But I do not think we can change existing module parameter. Right?

Right. Also, there are already too many amdgpu module parameters, we
should try to remove some rather than adding new ones for every little
thing that could be tweaked. :)

One possibility might be to optionally allow passing multiple values to
lockup_timeout, e.g.

 amdgpu.lockup_timeout=10000,0

The first value would need to have the same meaning as now for backwards
compatibility.


--
Earthling Michel Dänzer               |              https://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190428/c7e4343b/attachment.html>


More information about the amd-gfx mailing list