[PATCH v7 1/5] drm: Introduce device wedged event
Raag Jadav
raag.jadav at intel.com
Tue Oct 1 05:08:18 UTC 2024
On Mon, Sep 30, 2024 at 03:59:59PM +0300, Andy Shevchenko wrote:
> On Mon, Sep 30, 2024 at 01:08:41PM +0530, Raag Jadav wrote:
> > Introduce device wedged event, which will notify userspace of wedged
> > (hanged/unusable) state of the DRM device through a uevent. This is
> > useful especially in cases where the device is no longer operating as
> > expected even after a hardware reset and has become unrecoverable from
> > driver context.
> >
> > Purpose of this implementation is to provide drivers a generic way to
> > recover with the help of userspace intervention. Different drivers may
> > have different ideas of a "wedged device" depending on their hardware
> > implementation, and hence the vendor agnostic nature of the event.
> > It is up to the drivers to decide when they see the need for recovery
> > and how they want to recover from the available methods.
> >
> > Current implementation defines three recovery methods, out of which,
> > drivers can choose to support any one or multiple of them. Preferred
> > recovery method will be sent in the uevent environment as WEDGED=<method>.
> > Userspace consumers (sysadmin) can define udev rules to parse this event
> > and take respective action to recover the device.
> >
> > =============== ==================================
> > Recovery method Consumer expectations
> > =============== ==================================
> > rebind unbind + rebind driver
> > bus-reset unbind + reset bus device + rebind
> > reboot reboot system
> > =============== ==================================
>
> ...
>
> > +/*
> > + * Available recovery methods for wedged device. To be sent along with device
> > + * wedged uevent.
> > + */
> > +static const char *const drm_wedge_recovery_opts[] = {
> > + [DRM_WEDGE_RECOVERY_REBIND] = "rebind",
> > + [DRM_WEDGE_RECOVERY_BUS_RESET] = "bus-reset",
> > + [DRM_WEDGE_RECOVERY_REBOOT] = "reboot",
> > +};
>
> Place for static_assert() is here, as it closer to the actual data we test...
Shouldn't it be at the point of access?
If no, why do we care about the data when it's not being used?
> > +static bool drm_wedge_recovery_is_valid(enum drm_wedge_recovery method)
> > +{
> > + static_assert(ARRAY_SIZE(drm_wedge_recovery_opts) == DRM_WEDGE_RECOVERY_MAX);
>
> ...it doesn't fully belong to this function (or only to this function).
The purpose of having a helper is to have a single point of access, no?
Side note: It also goes well with is_valid() semantic IMHO.
> > + return method >= DRM_WEDGE_RECOVERY_REBIND && method < DRM_WEDGE_RECOVERY_MAX;
> > +}
>
> Why do we need this one-liner (after above comment being addressed) as a
> separate function?
I'm not sure if I'm following you. Method is not a constant here, we'll get it
on the stack.
Raag
More information about the Intel-gfx
mailing list