[PATCH v7 1/5] drm: Introduce device wedged event

Raag Jadav raag.jadav at intel.com
Tue Oct 1 05:08:18 UTC 2024


On Mon, Sep 30, 2024 at 03:59:59PM +0300, Andy Shevchenko wrote:
> On Mon, Sep 30, 2024 at 01:08:41PM +0530, Raag Jadav wrote:
> > Introduce device wedged event, which will notify userspace of wedged
> > (hanged/unusable) state of the DRM device through a uevent. This is
> > useful especially in cases where the device is no longer operating as
> > expected even after a hardware reset and has become unrecoverable from
> > driver context.
> > 
> > Purpose of this implementation is to provide drivers a generic way to
> > recover with the help of userspace intervention. Different drivers may
> > have different ideas of a "wedged device" depending on their hardware
> > implementation, and hence the vendor agnostic nature of the event.
> > It is up to the drivers to decide when they see the need for recovery
> > and how they want to recover from the available methods.
> > 
> > Current implementation defines three recovery methods, out of which,
> > drivers can choose to support any one or multiple of them. Preferred
> > recovery method will be sent in the uevent environment as WEDGED=<method>.
> > Userspace consumers (sysadmin) can define udev rules to parse this event
> > and take respective action to recover the device.
> > 
> >     =============== ==================================
> >     Recovery method Consumer expectations
> >     =============== ==================================
> >     rebind          unbind + rebind driver
> >     bus-reset       unbind + reset bus device + rebind
> >     reboot          reboot system
> >     =============== ==================================
> 
> ...
> 
> > +/*
> > + * Available recovery methods for wedged device. To be sent along with device
> > + * wedged uevent.
> > + */
> > +static const char *const drm_wedge_recovery_opts[] = {
> > +	[DRM_WEDGE_RECOVERY_REBIND] = "rebind",
> > +	[DRM_WEDGE_RECOVERY_BUS_RESET] = "bus-reset",
> > +	[DRM_WEDGE_RECOVERY_REBOOT] = "reboot",
> > +};
> 
> Place for static_assert() is here, as it closer to the actual data we test...

Shouldn't it be at the point of access?
If no, why do we care about the data when it's not being used?

> > +static bool drm_wedge_recovery_is_valid(enum drm_wedge_recovery method)
> > +{
> > +	static_assert(ARRAY_SIZE(drm_wedge_recovery_opts) == DRM_WEDGE_RECOVERY_MAX);
> 
> ...it doesn't fully belong to this function (or only to this function).

The purpose of having a helper is to have a single point of access, no?

Side note: It also goes well with is_valid() semantic IMHO.

> > +	return method >= DRM_WEDGE_RECOVERY_REBIND && method < DRM_WEDGE_RECOVERY_MAX;
> > +}
> 
> Why do we need this one-liner (after above comment being addressed) as a
> separate function?

I'm not sure if I'm following you. Method is not a constant here, we'll get it
on the stack.

Raag


More information about the Intel-gfx mailing list