[PATCH v4 1/3] drm: Introduce device wedged event

Lucas De Marchi lucas.demarchi at intel.com
Tue Sep 10 16:06:50 UTC 2024


On Tue, Sep 10, 2024 at 06:53:19PM GMT, Raag Jadav wrote:
>On Mon, Sep 09, 2024 at 03:01:50PM -0500, Lucas De Marchi wrote:
>> On Sun, Sep 08, 2024 at 11:08:39PM GMT, Asahi Lina wrote:
>> > On 9/8/24 12:07 AM, Lucas De Marchi wrote:
>> > > On Sat, Sep 07, 2024 at 08:38:30PM GMT, Asahi Lina wrote:
>> > > > On 9/6/24 6:42 PM, Raag Jadav wrote:
>> > > > > Introduce device wedged event, which will notify userspace of wedged
>> > > > > (hanged/unusable) state of the DRM device through a uevent. This is
>> > > > > useful especially in cases where the device is in unrecoverable state
>> > > > > and requires userspace intervention for recovery.
>> > > > >
>> > > > > Purpose of this implementation is to be vendor agnostic. Userspace
>> > > > > consumers (sysadmin) can define udev rules to parse this event and
>> > > > > take respective action to recover the device.
>> > > > >
>> > > > > Consumer expectations:
>> > > > > ----------------------
>> > > > > 1) Unbind driver
>> > > > > 2) Reset bus device
>> > > > > 3) Re-bind driver
>> > > >
>> > > > Is this supposed to be normative? For drm/asahi we have a "wedged"
>> > > > concept (firmware crashed), but the only possible recovery action is a
>> > > > full system reboot (which might still be desirable to allow userspace to
>> > > > trigger automatically in some scenarios) since there is no bus-level
>> > > > reset and no firmware reload possible.
>> > >
>> > > maybe let drivers hint possible/supported recovery mechanisms and then
>> > > sysadmin chooses what to do?
>> >
>> > How would we do this? A textual value for the event or something like
>> > that? ("WEDGED=bus-reset" vs "WEDGED=reboot"?)
>>
>> If there's a need for more than one, than I think exposing the supported
>> ones sorted by "side effect" in sysfs would be good. Something like:
>>
>> 	$ cat /sys/class/drm/card0/device/wedge_recover
>> 	rebind
>> 	bus-reset
>> 	reboot
>
>How do we expect the drivers to flag supported ones? Extra hooks?

The comment above... wedge_recover would be a sysfs exposed by the
driver to userspace with the supported mechanisms.

WEDGED=<mechanism> (which is also crafted by the driver or with explicit
functions in drm) would report to userspace the minimum
needed mechanism for recovery.

Lucas De Marchi


More information about the Intel-gfx mailing list