[PATCH v4 7/7] drm/xe: Wire devcoredump to LR TDR

Matthew Brost matthew.brost at intel.com
Thu Nov 14 02:25:22 UTC 2024


LR queues can hang, cause engine reset, or cause IOMMU CAT errors.
Collect an error capture when this occurs.

v2:
 - s/queue's/queues (Jonathan)
v4:
 - Fix build (CI)

Signed-off-by: Matthew Brost <matthew.brost at intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 985a21a72da4..f9ecee5364d8 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -896,13 +896,18 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 					 !exec_queue_pending_disable(q) ||
 					 xe_guc_read_stopped(guc), HZ * 5);
 		if (!ret) {
-			xe_gt_warn(q->gt, "Schedule disable failed to respond\n");
+			xe_gt_warn(q->gt, "Schedule disable failed to respond, guc_id=%d\n",
+				   q->guc->id);
+			xe_devcoredump(q, NULL);
 			xe_sched_submission_start(sched);
 			xe_gt_reset_async(q->gt);
 			return;
 		}
 	}
 
+	if (!exec_queue_killed(q) && !xe_lrc_ring_is_idle(q->lrc[0]))
+		xe_devcoredump(q, NULL);
+
 	xe_sched_submission_start(sched);
 }
 
-- 
2.34.1



More information about the Intel-xe mailing list