[PATCH v2 1/2] drm: Add GPU reset sysfs event

Christian König christian.koenig at amd.com
Wed Mar 23 15:25:02 UTC 2022


[Adding Marek and Andrey as well]

Am 23.03.22 um 16:14 schrieb Daniel Vetter:
> On Wed, 23 Mar 2022 at 15:07, Daniel Stone <daniel at fooishbar.org> wrote:
>> Hi,
>>
>> On Mon, 21 Mar 2022 at 16:02, Rob Clark <robdclark at gmail.com> wrote:
>>> On Mon, Mar 21, 2022 at 2:30 AM Christian König
>>> <christian.koenig at amd.com> wrote:
>>>> Well you can, it just means that their contexts are lost as well.
>>> Which is rather inconvenient when deqp-egl reset tests, for example,
>>> take down your compositor ;-)
>> Yeah. Or anything WebGL.
>>
>> System-wide collateral damage is definitely a non-starter. If that
>> means that the userspace driver has to do what iris does and ensure
>> everything's recreated and resubmitted, that works too, just as long
>> as the response to 'my adblocker didn't detect a crypto miner ad'  is
>> something better than 'shoot the entire user session'.
> Not sure where that idea came from, I thought at least I made it clear
> that legacy gl _has_ to recover. It's only vk and arb_robustness gl
> which should die without recovery attempt.
>
> The entire discussion here is who should be responsible for replay and
> at least if you can decide the uapi, then punting that entirely to
> userspace is a good approach.

Yes, completely agree. We have the approach of re-submitting things in 
the kernel and that failed quite miserable.

In other words currently a GPU reset has something like a 99% chance to 
get down your whole desktop.

Daniel can you briefly explain what exactly iris does when a lost 
context is detected without gl robustness?

It sounds like you guys got that working quite well.

Thanks,
Christian.

>
> Ofc it'd be nice if the collateral damage is limited, i.e. requests
> not currently on the gpu, or on different engines and all that
> shouldn't be nuked, if possible.
>
> Also ofc since msm uapi is that the kernel tries to recover there's
> not much we can do there, contexts cannot be shot. But still trying to
> replay them as much as possible feels a bit like overkill.
> -Daniel
>
>> Cheers,
>> Daniel
>
>



More information about the amd-gfx mailing list