[PATCH v2 2/5] drm/xe: Add a helper function to set recovery method
Riana Tauro
riana.tauro at intel.com
Mon Jun 23 10:01:05 UTC 2025
Add a helper function to set recovery method. The recovery
method has to be set before declaring the device wedged and sending the
drm wedged uevent. If no method is set, default unbind/re-bind method
will be set
v2: wrap to 80 characters
move gt wedged before uevent
fix kernel-doc (Raag)
Signed-off-by: Riana Tauro <riana.tauro at intel.com>
---
drivers/gpu/drm/xe/xe_device.c | 48 ++++++++++++++++++----------
drivers/gpu/drm/xe/xe_device.h | 1 +
drivers/gpu/drm/xe/xe_device_types.h | 2 ++
3 files changed, 35 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 8cfcfff250ca..eaff79257d85 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -1121,20 +1121,33 @@ static void xe_device_wedged_fini(struct drm_device *drm, void *arg)
xe_pm_runtime_put(xe);
}
+/**
+ * xe_device_set_wedged_method - Set wedged recovery method
+ * @xe: xe device instance
+ * @method: recovery method to set
+ *
+ * Set wedged recovery method to be sent using drm wedged uevent.
+ */
+void xe_device_set_wedged_method(struct xe_device *xe, unsigned long method)
+{
+ xe->wedged.method = method;
+}
+
/**
* xe_device_declare_wedged - Declare device wedged
* @xe: xe device instance
*
- * This is a final state that can only be cleared with a module
- * re-probe (unbind + bind).
- * In this state every IOCTL will be blocked so the GT cannot be used.
- * In general it will be called upon any critical error such as gt reset
- * failure or guc loading failure. Userspace will be notified of this state
- * through device wedged uevent.
- * If xe.wedged module parameter is set to 2, this function will be called
- * on every single execution timeout (a.k.a. GPU hang) right after devcoredump
- * snapshot capture. In this mode, GT reset won't be attempted so the state of
- * the issue is preserved for further debugging.
+ * This is a final state that can only be cleared with the method specified
+ * in the drm wedged uevent. The method needs to be set using
+ * xe_device_set_wedged_method before declaring the device as wedged or the
+ * default method of reprobe (unbind/re-bind) will be sent. In this state every
+ * IOCTL will be blocked so the GT cannot be used. In general it will be called
+ * upon any critical error such as gt reset failure or guc loading failure or
+ * firmware failure. Userspace will be notified of this state through device
+ * wedged uevent. If xe.wedged module parameter is set to 2, this function will
+ * be called on every single execution timeout (a.k.a. GPU hang) right after
+ * devcoredump snapshot capture. In this mode, GT reset won't be attempted so
+ * the state of the issue is preserved for further debugging.
*/
void xe_device_declare_wedged(struct xe_device *xe)
{
@@ -1153,6 +1166,14 @@ void xe_device_declare_wedged(struct xe_device *xe)
return;
}
+ /* If no wedge recovery method is set, use default */
+ if (!xe->wedged.method)
+ xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_REBIND |
+ DRM_WEDGE_RECOVERY_BUS_RESET);
+
+ for_each_gt(gt, xe, id)
+ xe_gt_declare_wedged(gt);
+
if (!atomic_xchg(&xe->wedged.flag, 1)) {
xe->needs_flr_on_fini = true;
drm_err(&xe->drm,
@@ -1162,11 +1183,6 @@ void xe_device_declare_wedged(struct xe_device *xe)
dev_name(xe->drm.dev));
/* Notify userspace of wedged device */
- drm_dev_wedged_event(&xe->drm,
- DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET,
- NULL);
+ drm_dev_wedged_event(&xe->drm, xe->wedged.method, NULL);
}
-
- for_each_gt(gt, xe, id)
- xe_gt_declare_wedged(gt);
}
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index e4da797a984b..40518af2333b 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -191,6 +191,7 @@ static inline bool xe_device_wedged(struct xe_device *xe)
}
void xe_device_declare_wedged(struct xe_device *xe);
+void xe_device_set_wedged_method(struct xe_device *xe, unsigned long method);
struct xe_file *xe_file_get(struct xe_file *xef);
void xe_file_put(struct xe_file *xef);
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 003afb279a5e..d898238e8994 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -564,6 +564,8 @@ struct xe_device {
atomic_t flag;
/** @wedged.mode: Mode controlled by kernel parameter and debugfs */
int mode;
+ /** @wedged.method: Recovery method to be sent in the drm device wedged uevent */
+ unsigned long method;
} wedged;
/** @bo_device: Struct to control async free of BOs */
--
2.47.1
More information about the Intel-xe
mailing list