[PATCH 4/4] drm/xe/guc: Don't treat GuC generic CAT error as protocol error

Michal Wajdeczko michal.wajdeczko at intel.com
Tue Nov 5 17:30:32 UTC 2024


GuC uses GUC_ID_MAX if it can not map the CAT fault to any PF/VF
context. We shouldn't treat that as G2H protocol error that would
justify a GT reset, as it may happen due to some VF activity.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
Cc: Matthew Brost <matthew.brost at intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 147000fd1177..696f8884040a 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -2023,6 +2023,15 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
 
 	guc_id = msg[0];
 
+	if (guc_id == GUC_ID_MAX) {
+		/*
+		 * GuC uses GUC_ID_MAX if it can not map the CAT fault to any PF/VF
+		 * context. Only PF will be notified about that.
+		 */
+		xe_gt_err_ratelimited(gt, "Memory CAT error reported by GuC!\n");
+		return 0;
+	}
+
 	q = g2h_exec_queue_lookup(guc, guc_id);
 	if (unlikely(!q))
 		return -EPROTO;
-- 
2.43.0



More information about the Intel-xe mailing list