[PATCH v10 1/4] drm: Introduce device wedged event

Raag Jadav raag.jadav at intel.com
Wed Dec 4 11:17:17 UTC 2024


+ misc maintainers

On Tue, Dec 03, 2024 at 11:18:00AM +0100, Christian König wrote:
> Am 03.12.24 um 06:00 schrieb Raag Jadav:
> > On Mon, Dec 02, 2024 at 10:07:59AM +0200, Raag Jadav wrote:
> > > On Fri, Nov 29, 2024 at 10:40:14AM -0300, André Almeida wrote:
> > > > Hi Raag,
> > > > 
> > > > Em 28/11/2024 12:37, Raag Jadav escreveu:
> > > > > Introduce device wedged event, which notifies userspace of 'wedged'
> > > > > (hanged/unusable) state of the DRM device through a uevent. This is
> > > > > useful especially in cases where the device is no longer operating as
> > > > > expected and has become unrecoverable from driver context. Purpose of
> > > > > this implementation is to provide drivers a generic way to recover with
> > > > > the help of userspace intervention without taking any drastic measures
> > > > > in the driver.
> > > > > 
> > > > > A 'wedged' device is basically a dead device that needs attention. The
> > > > > uevent is the notification that is sent to userspace along with a hint
> > > > > about what could possibly be attempted to recover the device and bring
> > > > > it back to usable state. Different drivers may have different ideas of
> > > > > a 'wedged' device depending on their hardware implementation, and hence
> > > > > the vendor agnostic nature of the event. It is up to the drivers to
> > > > > decide when they see the need for device recovery and how they want to
> > > > > recover from the available methods.
> > > > > 
> > > > Thank you for your work. Do you think you can add the optional PID
> > > > parameter, as the PID of the app that caused the reset? For SteamOS use case
> > > > it has been proved to be useful to kill the fault app as well. If the reset
> > > > was caused by a kthread, no PID can be provided hence it's an optional
> > > > parameter.
> > > Hmm, I'm not sure if it really fits here since it doesn't seem like
> > > a generic usecase.
> > > 
> > > I'd still be open for it if found useful by the drivers but perhaps
> > > as an extended feature in a separate series.
> > What do you think Chris, are we good to go with v10?
> 
> I agree with Andre that the PID and maybe the new DRM client name would be
> really nice to have here.
> 
> We do have that in the device core dump we create, but if an application is
> supervised by daemon for example then that would be really useful.
> 
> On the other hand I think we should merge the documentation and code as is
> and then add the PID/name later on. That is essentially a separate
> discussion.

So how do we proceed, perhaps through misc tree?

Raag


More information about the amd-gfx mailing list