[PATCH v4 1/3] drm: Introduce device wedged event
Raag Jadav
raag.jadav at intel.com
Tue Sep 10 15:53:19 UTC 2024
On Mon, Sep 09, 2024 at 03:01:50PM -0500, Lucas De Marchi wrote:
> On Sun, Sep 08, 2024 at 11:08:39PM GMT, Asahi Lina wrote:
> > On 9/8/24 12:07 AM, Lucas De Marchi wrote:
> > > On Sat, Sep 07, 2024 at 08:38:30PM GMT, Asahi Lina wrote:
> > > > On 9/6/24 6:42 PM, Raag Jadav wrote:
> > > > > Introduce device wedged event, which will notify userspace of wedged
> > > > > (hanged/unusable) state of the DRM device through a uevent. This is
> > > > > useful especially in cases where the device is in unrecoverable state
> > > > > and requires userspace intervention for recovery.
> > > > >
> > > > > Purpose of this implementation is to be vendor agnostic. Userspace
> > > > > consumers (sysadmin) can define udev rules to parse this event and
> > > > > take respective action to recover the device.
> > > > >
> > > > > Consumer expectations:
> > > > > ----------------------
> > > > > 1) Unbind driver
> > > > > 2) Reset bus device
> > > > > 3) Re-bind driver
> > > >
> > > > Is this supposed to be normative? For drm/asahi we have a "wedged"
> > > > concept (firmware crashed), but the only possible recovery action is a
> > > > full system reboot (which might still be desirable to allow userspace to
> > > > trigger automatically in some scenarios) since there is no bus-level
> > > > reset and no firmware reload possible.
> > >
> > > maybe let drivers hint possible/supported recovery mechanisms and then
> > > sysadmin chooses what to do?
> >
> > How would we do this? A textual value for the event or something like
> > that? ("WEDGED=bus-reset" vs "WEDGED=reboot"?)
>
> If there's a need for more than one, than I think exposing the supported
> ones sorted by "side effect" in sysfs would be good. Something like:
>
> $ cat /sys/class/drm/card0/device/wedge_recover
> rebind
> bus-reset
> reboot
How do we expect the drivers to flag supported ones? Extra hooks?
Raag
More information about the Intel-gfx
mailing list