[PATCH v10 1/4] drm: Introduce device wedged event

Maxime Ripard mripard at kernel.org
Mon Dec 16 16:07:02 UTC 2024


Hi,

On Thu, Dec 12, 2024 at 12:37:59PM +0200, Raag Jadav wrote:
> On Wed, Dec 11, 2024 at 06:14:12PM +0100, Maxime Ripard wrote:
> > On Wed, Dec 04, 2024 at 01:17:17PM +0200, Raag Jadav wrote:
> > > + misc maintainers
> > > 
> > > On Tue, Dec 03, 2024 at 11:18:00AM +0100, Christian König wrote:
> > > > Am 03.12.24 um 06:00 schrieb Raag Jadav:
> > > > > On Mon, Dec 02, 2024 at 10:07:59AM +0200, Raag Jadav wrote:
> > > > > > On Fri, Nov 29, 2024 at 10:40:14AM -0300, André Almeida wrote:
> > > > > > > Hi Raag,
> > > > > > > 
> > > > > > > Em 28/11/2024 12:37, Raag Jadav escreveu:
> > > > > > > > Introduce device wedged event, which notifies userspace of 'wedged'
> > > > > > > > (hanged/unusable) state of the DRM device through a uevent. This is
> > > > > > > > useful especially in cases where the device is no longer operating as
> > > > > > > > expected and has become unrecoverable from driver context. Purpose of
> > > > > > > > this implementation is to provide drivers a generic way to recover with
> > > > > > > > the help of userspace intervention without taking any drastic measures
> > > > > > > > in the driver.
> > > > > > > > 
> > > > > > > > A 'wedged' device is basically a dead device that needs attention. The
> > > > > > > > uevent is the notification that is sent to userspace along with a hint
> > > > > > > > about what could possibly be attempted to recover the device and bring
> > > > > > > > it back to usable state. Different drivers may have different ideas of
> > > > > > > > a 'wedged' device depending on their hardware implementation, and hence
> > > > > > > > the vendor agnostic nature of the event. It is up to the drivers to
> > > > > > > > decide when they see the need for device recovery and how they want to
> > > > > > > > recover from the available methods.
> > > > > > > > 
> > > > > > > Thank you for your work. Do you think you can add the optional PID
> > > > > > > parameter, as the PID of the app that caused the reset? For SteamOS use case
> > > > > > > it has been proved to be useful to kill the fault app as well. If the reset
> > > > > > > was caused by a kthread, no PID can be provided hence it's an optional
> > > > > > > parameter.
> > > > > > Hmm, I'm not sure if it really fits here since it doesn't seem like
> > > > > > a generic usecase.
> > > > > > 
> > > > > > I'd still be open for it if found useful by the drivers but perhaps
> > > > > > as an extended feature in a separate series.
> > > > > What do you think Chris, are we good to go with v10?
> > > > 
> > > > I agree with Andre that the PID and maybe the new DRM client name would be
> > > > really nice to have here.
> > > > 
> > > > We do have that in the device core dump we create, but if an application is
> > > > supervised by daemon for example then that would be really useful.
> > > > 
> > > > On the other hand I think we should merge the documentation and code as is
> > > > and then add the PID/name later on. That is essentially a separate
> > > > discussion.
> > > 
> > > So how do we proceed, perhaps through misc tree?
> > 
> > Provided it follows the usual rules (ie, Reviewed-by, open source
> > userspace tools using it if it's a new uAPI, etc.) then yeah, we can
> > merge it through drm-misc.
> 
> My understanding is that the core patches are to be reviewed by the
> maintainers? The rest of it (patch 2 to 4) seems already reviewed.
> 
> We have a documented example (patch 2) with udev rule and a reference
> script which can be setup to get this working. Does that qualify as
> a consumer?

Given the description you stated above, I'd expect a compositor to be
the expected user, right?

Our doc
(https://docs.kernel.org/gpu/drm-uapi.html#open-source-userspace-requirements)
states:

  The open-source userspace must not be a toy/test application, but the
  real thing. Specifically it needs to handle all the usual error and
  corner cases. These are often the places where new uAPI falls apart
  and hence essential to assess the fitness of a proposed interface.

Maxime
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 273 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20241216/1e4af77f/attachment.sig>


More information about the amd-gfx mailing list