[PATCH] drm/xe/gsc: Wedge the device if the GSCCS reset fails
Daniele Ceraolo Spurio
daniele.ceraolospurio at intel.com
Wed Aug 28 22:14:57 UTC 2024
Due to the special handling of the GSCCS in HW, we can't escalate to GT
reset when we receive the reset failure interrupt; the specs indicate
that we should trigger an FLR instead, but we do not have support for
that at the moment, so the HW will stay permanently in a broken state.
We should therefore mark the device as wedged, the same as if the GT
reset had failed.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
---
drivers/gpu/drm/xe/xe_gsc.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
index 648786afffe0..6fbea70d3d36 100644
--- a/drivers/gpu/drm/xe/xe_gsc.c
+++ b/drivers/gpu/drm/xe/xe_gsc.c
@@ -335,9 +335,11 @@ static int gsc_er_complete(struct xe_gt *gt)
if (er_status == GSCI_TIMER_STATUS_TIMER_EXPIRED) {
/*
* XXX: we should trigger an FLR here, but we don't have support
- * for that yet.
+ * for that yet. Since we can't recover from the error, we
+ * declare the device as wedged.
*/
xe_gt_err(gt, "GSC ER timed out!\n");
+ xe_device_declare_wedged(gt_to_xe(gt));
return -EIO;
}
--
2.43.0
More information about the Intel-xe
mailing list