[PATCH] drm/xe/gsc: Wedge the device if the GSCCS reset fails

Daniele Ceraolo Spurio daniele.ceraolospurio at intel.com
Wed Aug 28 22:14:57 UTC 2024


Due to the special handling of the GSCCS in HW, we can't escalate to GT
reset when we receive the reset failure interrupt; the specs indicate
that we should trigger an FLR instead, but we do not have support for
that at the moment, so the HW will stay permanently in a broken state.
We should therefore mark the device as wedged, the same as if the GT
reset had failed.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
---
 drivers/gpu/drm/xe/xe_gsc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
index 648786afffe0..6fbea70d3d36 100644
--- a/drivers/gpu/drm/xe/xe_gsc.c
+++ b/drivers/gpu/drm/xe/xe_gsc.c
@@ -335,9 +335,11 @@ static int gsc_er_complete(struct xe_gt *gt)
 	if (er_status == GSCI_TIMER_STATUS_TIMER_EXPIRED) {
 		/*
 		 * XXX: we should trigger an FLR here, but we don't have support
-		 * for that yet.
+		 * for that yet. Since we can't recover from the error, we
+		 * declare the device as wedged.
 		 */
 		xe_gt_err(gt, "GSC ER timed out!\n");
+		xe_device_declare_wedged(gt_to_xe(gt));
 		return -EIO;
 	}
 
-- 
2.43.0



More information about the Intel-xe mailing list