[PATCH v6 1/4] drm: Introduce device wedged event

Raag Jadav raag.jadav at intel.com
Mon Sep 23 14:35:23 UTC 2024


On Mon, Sep 23, 2024 at 11:38:55AM +0300, Andy Shevchenko wrote:
> On Mon, Sep 23, 2024 at 09:28:23AM +0530, Raag Jadav wrote:
> > Introduce device wedged event, which will notify userspace of wedged
> > (hanged/unusable) state of the DRM device through a uevent. This is
> > useful especially in cases where the device is no longer operating as
> > expected and has become unrecoverable from driver context.
> > 
> > Purpose of this implementation is to provide drivers a way to recover
> > through userspace intervention. Different drivers may have different
> > ideas of a "wedged device" depending on their hardware implementation,
> > and hence the vendor agnostic nature of the event. It is up to the drivers
> > to decide when they see the need for recovery and how they want to recover
> > from the available methods.
> > 
> > Current implementation defines three recovery methods, out of which,
> > drivers can choose to support any one or multiple of them. Preferred
> > recovery method will be sent in the uevent environment as WEDGED=<method>.
> > Userspace consumers (sysadmin) can define udev rules to parse this event
> > and take respective action to recover the device.
> > 
> >  Method    | Consumer expectations
> > -----------|-----------------------------------
> >  rebind    | unbind + rebind driver
> >  bus-reset | unbind + reset bus device + rebind
> >  reboot    | reboot system
> 
> > v4: s/drm_dev_wedged/drm_dev_wedged_event
> >     Use drm_info() (Jani)
> >     Kernel doc adjustment (Aravind)
> > v5: Send recovery method with uevent (Lina)
> > v6: Access wedge_recovery_opts[] using helper function (Jani)
> >     Use snprintf() (Jani)
> 
> Hmm... Isn't changelog in the cover letter is not enough?

Which was initial thought but I'm told otherwise ¯\_(ツ)_/¯

> ...
> 
> > +extern const char *const wedge_recovery_opts[];
> 
> It's not NULL terminated. How users will know that they have an index valid?

It's expected to be accessed using recovery_*() helpers.
 
> Either you NULL-terminate that, or export the size as well (personally I would
> go with the first approach).
> 
> ...
> 
> > +static inline bool recovery_method_is_valid(enum wedge_recovery_method method)
> > +{
> > +	if (method >= DRM_WEDGE_RECOVERY_REBIND && method < DRM_WEDGE_RECOVERY_MAX)
> > +		return true;
> > +
> > +	return false;
> 
> Besides that this can be written as
> 
> 	return method >= DRM_WEDGE_RECOVERY_REBIND && method < DRM_WEDGE_RECOVERY_MAX;
> 
> > +}
> 
> this seems a runtime approach for what we have at compile-time, i.e. static_assert()

My understanding is that we have runtime users that the compiler may not be
able to resolve.

Raag


More information about the Intel-gfx mailing list