[PATCH v10 1/4] drm: Introduce device wedged event

Raag Jadav raag.jadav at intel.com
Thu Dec 12 10:37:59 UTC 2024


On Wed, Dec 11, 2024 at 06:14:12PM +0100, Maxime Ripard wrote:
> On Wed, Dec 04, 2024 at 01:17:17PM +0200, Raag Jadav wrote:
> > + misc maintainers
> > 
> > On Tue, Dec 03, 2024 at 11:18:00AM +0100, Christian König wrote:
> > > Am 03.12.24 um 06:00 schrieb Raag Jadav:
> > > > On Mon, Dec 02, 2024 at 10:07:59AM +0200, Raag Jadav wrote:
> > > > > On Fri, Nov 29, 2024 at 10:40:14AM -0300, André Almeida wrote:
> > > > > > Hi Raag,
> > > > > > 
> > > > > > Em 28/11/2024 12:37, Raag Jadav escreveu:
> > > > > > > Introduce device wedged event, which notifies userspace of 'wedged'
> > > > > > > (hanged/unusable) state of the DRM device through a uevent. This is
> > > > > > > useful especially in cases where the device is no longer operating as
> > > > > > > expected and has become unrecoverable from driver context. Purpose of
> > > > > > > this implementation is to provide drivers a generic way to recover with
> > > > > > > the help of userspace intervention without taking any drastic measures
> > > > > > > in the driver.
> > > > > > > 
> > > > > > > A 'wedged' device is basically a dead device that needs attention. The
> > > > > > > uevent is the notification that is sent to userspace along with a hint
> > > > > > > about what could possibly be attempted to recover the device and bring
> > > > > > > it back to usable state. Different drivers may have different ideas of
> > > > > > > a 'wedged' device depending on their hardware implementation, and hence
> > > > > > > the vendor agnostic nature of the event. It is up to the drivers to
> > > > > > > decide when they see the need for device recovery and how they want to
> > > > > > > recover from the available methods.
> > > > > > > 
> > > > > > Thank you for your work. Do you think you can add the optional PID
> > > > > > parameter, as the PID of the app that caused the reset? For SteamOS use case
> > > > > > it has been proved to be useful to kill the fault app as well. If the reset
> > > > > > was caused by a kthread, no PID can be provided hence it's an optional
> > > > > > parameter.
> > > > > Hmm, I'm not sure if it really fits here since it doesn't seem like
> > > > > a generic usecase.
> > > > > 
> > > > > I'd still be open for it if found useful by the drivers but perhaps
> > > > > as an extended feature in a separate series.
> > > > What do you think Chris, are we good to go with v10?
> > > 
> > > I agree with Andre that the PID and maybe the new DRM client name would be
> > > really nice to have here.
> > > 
> > > We do have that in the device core dump we create, but if an application is
> > > supervised by daemon for example then that would be really useful.
> > > 
> > > On the other hand I think we should merge the documentation and code as is
> > > and then add the PID/name later on. That is essentially a separate
> > > discussion.
> > 
> > So how do we proceed, perhaps through misc tree?
> 
> Provided it follows the usual rules (ie, Reviewed-by, open source
> userspace tools using it if it's a new uAPI, etc.) then yeah, we can
> merge it through drm-misc.

My understanding is that the core patches are to be reviewed by the
maintainers? The rest of it (patch 2 to 4) seems already reviewed.

We have a documented example (patch 2) with udev rule and a reference
script which can be setup to get this working. Does that qualify as
a consumer?

Raag


More information about the Intel-gfx mailing list