[PATCH v4 5/9] drm/xe/xe_survivability: Add support for Runtime survivability mode
Umesh Nerlige Ramappa
umesh.nerlige.ramappa at intel.com
Wed Jul 9 23:44:44 UTC 2025
On Wed, Jul 09, 2025 at 04:50:17PM +0530, Riana Tauro wrote:
>Certain runtime firmware errors can cause the device to be in a unusable
>state requiring a firmware flash to restore normal operation.
>Runtime Survivability Mode indicates firmware flash is necessary by
>wedging the device and exposing survivability mode sysfs.
>
>The below sysfs is an indication that device is in survivability mode
>
>/sys/bus/pci/devices/<device>/survivability_mode
>
>Signed-off-by: Riana Tauro <riana.tauro at intel.com>
>---
> drivers/gpu/drm/xe/xe_survivability_mode.c | 42 ++++++++++++++++++-
> drivers/gpu/drm/xe/xe_survivability_mode.h | 1 +
> .../gpu/drm/xe/xe_survivability_mode_types.h | 1 +
> 3 files changed, 43 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.c b/drivers/gpu/drm/xe/xe_survivability_mode.c
>index fefb027b1c84..ca1cfa13525a 100644
>--- a/drivers/gpu/drm/xe/xe_survivability_mode.c
>+++ b/drivers/gpu/drm/xe/xe_survivability_mode.c
>@@ -137,7 +137,8 @@ static ssize_t survivability_mode_show(struct device *dev,
> struct xe_survivability_info *info = survivability->info;
> int index = 0, count = 0;
>
>- count += sysfs_emit_at(buff, count, "Survivability mode type: Boot\n");
>+ count += sysfs_emit_at(buff, count, "Survivability mode type: %s\n",
>+ survivability->type ? "Runtime" : "Boot");
>
> if (!check_boot_failure(xe))
> return count;
>@@ -288,6 +289,45 @@ bool xe_survivability_mode_is_requested(struct xe_device *xe)
> return check_boot_failure(xe);
> }
>
>+/**
>+ * xe_survivability_mode_runtime_enable - Initialize and enable runtime survivability mode
>+ * @xe: xe device instance
>+ *
>+ * Initialize survivability information and enable runtime survivability mode.
>+ * Runtime survivability mode is enabled when certain errors cause the device to be
>+ * in non-recoverable state. The device is declared wedged with the appropriate
>+ * recovery method and survivability mode sysfs exposed to userspace
>+ *
>+ * Return: 0 if runtime survivability mode is enabled or not requested, negative error
is the "not requested" still applicable here?
>+ * code otherwise.
>+ */
>+int xe_survivability_mode_runtime_enable(struct xe_device *xe)
>+{
>+ struct xe_survivability *survivability = &xe->survivability;
>+ struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
>+ int ret;
>+
>+ if (!IS_DGFX(xe) || IS_SRIOV_VF(xe) || xe->info.platform < XE_BATTLEMAGE) {
Do you think this condition can be better handled with a
has_runtime_survivability for platforms that support it?
>+ dev_err(&pdev->dev, "Runtime Survivability Mode not supported\n");
>+ return -EINVAL;
>+ }
>+
>+ ret = init_survivability_mode(xe);
>+ if (ret)
>+ return ret;
>+
>+ ret = create_survivability_sysfs(pdev);
>+ if (ret)
>+ dev_err(&pdev->dev, "Failed to create survivability mode sysfs\n");
You do not return ret in the above if condition. Is that intenational?
Regards,
Umesh
>+
>+ survivability->type = XE_SURVIVABILITY_TYPE_RUNTIME;
>+ xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_VENDOR);
>+ xe_device_declare_wedged(xe);
>+
>+ dev_err(&pdev->dev, "Runtime Survivability mode enabled\n");
>+ return 0;
>+}
>+
> /**
> * xe_survivability_mode_boot_enable - Initialize and enable boot survivability mode
> * @xe: xe device instance
>diff --git a/drivers/gpu/drm/xe/xe_survivability_mode.h b/drivers/gpu/drm/xe/xe_survivability_mode.h
>index f6ee283ea5e8..1cc94226aa82 100644
>--- a/drivers/gpu/drm/xe/xe_survivability_mode.h
>+++ b/drivers/gpu/drm/xe/xe_survivability_mode.h
>@@ -11,6 +11,7 @@
> struct xe_device;
>
> int xe_survivability_mode_boot_enable(struct xe_device *xe);
>+int xe_survivability_mode_runtime_enable(struct xe_device *xe);
> bool xe_survivability_mode_is_boot_enabled(struct xe_device *xe);
> bool xe_survivability_mode_is_requested(struct xe_device *xe);
>
>diff --git a/drivers/gpu/drm/xe/xe_survivability_mode_types.h b/drivers/gpu/drm/xe/xe_survivability_mode_types.h
>index 5dce393498da..cd65a5d167c9 100644
>--- a/drivers/gpu/drm/xe/xe_survivability_mode_types.h
>+++ b/drivers/gpu/drm/xe/xe_survivability_mode_types.h
>@@ -11,6 +11,7 @@
>
> enum xe_survivability_type {
> XE_SURVIVABILITY_TYPE_BOOT,
>+ XE_SURVIVABILITY_TYPE_RUNTIME,
> };
>
> struct xe_survivability_info {
>--
>2.47.1
>
More information about the Intel-xe
mailing list