[PATCH v4 1/3] drm: Introduce device wedged event

Rodrigo Vivi rodrigo.vivi at intel.com
Mon Sep 9 20:43:15 UTC 2024


On Sun, Sep 08, 2024 at 11:08:39PM +0900, Asahi Lina wrote:
> 
> 
> On 9/8/24 12:07 AM, Lucas De Marchi wrote:
> > On Sat, Sep 07, 2024 at 08:38:30PM GMT, Asahi Lina wrote:
> >>
> >>
> >> On 9/6/24 6:42 PM, Raag Jadav wrote:
> >>> Introduce device wedged event, which will notify userspace of wedged
> >>> (hanged/unusable) state of the DRM device through a uevent. This is
> >>> useful especially in cases where the device is in unrecoverable state
> >>> and requires userspace intervention for recovery.
> >>>
> >>> Purpose of this implementation is to be vendor agnostic. Userspace
> >>> consumers (sysadmin) can define udev rules to parse this event and
> >>> take respective action to recover the device.
> >>>
> >>> Consumer expectations:
> >>> ----------------------
> >>> 1) Unbind driver
> >>> 2) Reset bus device
> >>> 3) Re-bind driver
> >>
> >> Is this supposed to be normative? For drm/asahi we have a "wedged"
> >> concept (firmware crashed), but the only possible recovery action is a
> >> full system reboot (which might still be desirable to allow userspace to
> >> trigger automatically in some scenarios) since there is no bus-level
> >> reset and no firmware reload possible.
> > 
> > maybe let drivers hint possible/supported recovery mechanisms and then
> > sysadmin chooses what to do?
> 
> How would we do this? A textual value for the event or something like
> that? ("WEDGED=bus-reset" vs "WEDGED=reboot"?)

Looks like a good idea.

Although in our case it is not just a 'bus-reset' but unbind+bus_reset+rebind,
but that should be okay to have 'bus-reset' kind of text and driver
to document the meaning.

> 
> ~~ Lina


More information about the Intel-xe mailing list