[PATCH v10 1/4] drm: Introduce device wedged event
Christian König
christian.koenig at amd.com
Tue Dec 3 10:18:00 UTC 2024
Am 03.12.24 um 06:00 schrieb Raag Jadav:
> On Mon, Dec 02, 2024 at 10:07:59AM +0200, Raag Jadav wrote:
>> On Fri, Nov 29, 2024 at 10:40:14AM -0300, André Almeida wrote:
>>> Hi Raag,
>>>
>>> Em 28/11/2024 12:37, Raag Jadav escreveu:
>>>> Introduce device wedged event, which notifies userspace of 'wedged'
>>>> (hanged/unusable) state of the DRM device through a uevent. This is
>>>> useful especially in cases where the device is no longer operating as
>>>> expected and has become unrecoverable from driver context. Purpose of
>>>> this implementation is to provide drivers a generic way to recover with
>>>> the help of userspace intervention without taking any drastic measures
>>>> in the driver.
>>>>
>>>> A 'wedged' device is basically a dead device that needs attention. The
>>>> uevent is the notification that is sent to userspace along with a hint
>>>> about what could possibly be attempted to recover the device and bring
>>>> it back to usable state. Different drivers may have different ideas of
>>>> a 'wedged' device depending on their hardware implementation, and hence
>>>> the vendor agnostic nature of the event. It is up to the drivers to
>>>> decide when they see the need for device recovery and how they want to
>>>> recover from the available methods.
>>>>
>>> Thank you for your work. Do you think you can add the optional PID
>>> parameter, as the PID of the app that caused the reset? For SteamOS use case
>>> it has been proved to be useful to kill the fault app as well. If the reset
>>> was caused by a kthread, no PID can be provided hence it's an optional
>>> parameter.
>> Hmm, I'm not sure if it really fits here since it doesn't seem like
>> a generic usecase.
>>
>> I'd still be open for it if found useful by the drivers but perhaps
>> as an extended feature in a separate series.
> What do you think Chris, are we good to go with v10?
I agree with Andre that the PID and maybe the new DRM client name would
be really nice to have here.
We do have that in the device core dump we create, but if an application
is supervised by daemon for example then that would be really useful.
On the other hand I think we should merge the documentation and code as
is and then add the PID/name later on. That is essentially a separate
discussion.
Regards,
Christian.
>
> Raag
More information about the amd-gfx
mailing list