[PATCH v3 1/3] drm: Create an app info option for wedge events
Raag Jadav
raag.jadav at intel.com
Sun May 18 21:26:00 UTC 2025
On Mon, May 12, 2025 at 05:34:35PM -0300, André Almeida wrote:
> When a device get wedged, it might be caused by a guilty application.
> For userspace, knowing which app was the cause can be useful for some
> situations, like for implementing a policy, logs or for giving a chance
> for the compositor to let the user know what app caused the problem.
> This is an optional argument, when the app info is not available, the
> PID and APP string won't appear in the event string.
>
> Sometimes just the PID isn't enough giving that the app might be already
> dead by the time userspace will try to check what was this PID's name,
> so to make the life easier also notify what's the app's name in the user
> event.
...
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 3dc7acd56b1d..e30efa4ac330 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -542,6 +542,7 @@ static const char *drm_get_wedge_recovery(unsigned int opt)
> * drm_dev_wedged_event - generate a device wedged uevent
> * @dev: DRM device
> * @method: method(s) to be used for recovery
> + * @info: optional information about the guilty app
> *
> * This generates a device wedged uevent for the DRM device specified by @dev.
> * Recovery @method\(s) of choice will be sent in the uevent environment as
> @@ -554,13 +555,14 @@ static const char *drm_get_wedge_recovery(unsigned int opt)
> *
> * Returns: 0 on success, negative error code otherwise.
> */
> -int drm_dev_wedged_event(struct drm_device *dev, unsigned long method)
> +int drm_dev_wedged_event(struct drm_device *dev, unsigned long method,
> + struct drm_wedge_app_info *info)
> {
> const char *recovery = NULL;
> unsigned int len, opt;
> /* Event string length up to 28+ characters with available methods */
> - char event_string[32];
> - char *envp[] = { event_string, NULL };
> + char event_string[32], pid_string[15] = "", comm_string[TASK_COMM_LEN] = "";
Let's make these into proper defines for consistency,
#define WEDGE_STR_LEN
#define PID_LEN
and drop redundant comment.
> + char *envp[] = { event_string, NULL, NULL, NULL };
>
> len = scnprintf(event_string, sizeof(event_string), "%s", "WEDGED=");
>
> @@ -582,6 +584,13 @@ int drm_dev_wedged_event(struct drm_device *dev, unsigned long method)
> drm_info(dev, "device wedged, %s\n", method == DRM_WEDGE_RECOVERY_NONE ?
> "but recovered through reset" : "needs recovery");
>
> + if (info) {
> + snprintf(pid_string, sizeof(pid_string), "PID=%u", info->pid);
> + snprintf(comm_string, sizeof(comm_string), "APP=%s", info->comm);
Since this is a core helper, we'd want to make sure these fields are valid
and not being abused.
Also s/APP/TASK IMHO, but upto you.
Raag
More information about the Intel-gfx
mailing list