[PATCH v4 1/3] drm: Introduce device wedged event

Lucas De Marchi lucas.demarchi at intel.com
Mon Sep 9 20:01:50 UTC 2024


On Sun, Sep 08, 2024 at 11:08:39PM GMT, Asahi Lina wrote:
>
>
>On 9/8/24 12:07 AM, Lucas De Marchi wrote:
>> On Sat, Sep 07, 2024 at 08:38:30PM GMT, Asahi Lina wrote:
>>>
>>>
>>> On 9/6/24 6:42 PM, Raag Jadav wrote:
>>>> Introduce device wedged event, which will notify userspace of wedged
>>>> (hanged/unusable) state of the DRM device through a uevent. This is
>>>> useful especially in cases where the device is in unrecoverable state
>>>> and requires userspace intervention for recovery.
>>>>
>>>> Purpose of this implementation is to be vendor agnostic. Userspace
>>>> consumers (sysadmin) can define udev rules to parse this event and
>>>> take respective action to recover the device.
>>>>
>>>> Consumer expectations:
>>>> ----------------------
>>>> 1) Unbind driver
>>>> 2) Reset bus device
>>>> 3) Re-bind driver
>>>
>>> Is this supposed to be normative? For drm/asahi we have a "wedged"
>>> concept (firmware crashed), but the only possible recovery action is a
>>> full system reboot (which might still be desirable to allow userspace to
>>> trigger automatically in some scenarios) since there is no bus-level
>>> reset and no firmware reload possible.
>>
>> maybe let drivers hint possible/supported recovery mechanisms and then
>> sysadmin chooses what to do?
>
>How would we do this? A textual value for the event or something like
>that? ("WEDGED=bus-reset" vs "WEDGED=reboot"?)

If there's a need for more than one, than I think exposing the supported
ones sorted by "side effect" in sysfs would be good. Something like:

	$ cat /sys/class/drm/card0/device/wedge_recover
	rebind
	bus-reset
	reboot

Although if there is actually an unrecoverable state like "reboot", you
could simply remove the underlying device from the kernel side, with no
userspace intervention.

Lucas De Marchi

>
>~~ Lina


More information about the Intel-gfx mailing list