[Intel-gfx] [RFC PATCH] drm/i915/guc: Fix premature release of context on reset

Janusz Krzysztofik janusz.krzysztofik at linux.intel.com
Wed Jul 24 15:05:25 UTC 2019


When using GuC submission, some execlists originated helper functions
are reused.  One of them, used inside guc_reset() and
guc_cancel_requests() callbacks introduced by commit 292ad25c22d9
("drm/i915/guc: Implement reset locally"), unfortunately calls
execlists_schedule_out() helper instead of its GuC specific equivalent.
As execlists functions maintain context references for themselves, that
helper releases a context associated with a request being processed as
soon as the context inflight queue becomes empty.  Since GuC submission
doesn't keep extra context references, possibly still active contexts
may be released prematurely, resulting in kernel panic.

Fix it by providing a local, modified copy of
execlists_cancel_port_requests() helper.

Fixes: 292ad25c22d9 ("drm/i915/guc: Implement reset locally")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
---
Based on drm-intel-next-queued as of 2019-07-24, can be rebased easily
on top of drm-intel-next-2019-07-08 if needed (source file moved).

 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a0f2a01365bc..b1ea7a818d61 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -651,6 +651,21 @@ static void guc_reset_prepare(struct intel_engine_cs *engine)
 	__tasklet_disable_sync_once(&execlists->tasklet);
 }
 
+static void
+cancel_port_requests(struct intel_engine_execlists * const execlists)
+{
+	struct i915_request * const *port, *rq;
+
+	for (port = execlists->pending; (rq = *port); port++)
+		schedule_out(rq);
+	memset(execlists->pending, 0, sizeof(execlists->pending));
+
+	for (port = execlists->active; (rq = *port); port++)
+		schedule_out(rq);
+	execlists->active =
+		memset(execlists->inflight, 0, sizeof(execlists->inflight));
+}
+
 static void guc_reset(struct intel_engine_cs *engine, bool stalled)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -659,7 +674,7 @@ static void guc_reset(struct intel_engine_cs *engine, bool stalled)
 
 	spin_lock_irqsave(&engine->active.lock, flags);
 
-	execlists_cancel_port_requests(execlists);
+	cancel_port_requests(execlists);
 
 	/* Push back any incomplete requests for replay after the reset. */
 	rq = execlists_unwind_incomplete_requests(execlists);
@@ -702,7 +717,7 @@ static void guc_cancel_requests(struct intel_engine_cs *engine)
 	spin_lock_irqsave(&engine->active.lock, flags);
 
 	/* Cancel the requests on the HW and clear the ELSP tracker. */
-	execlists_cancel_port_requests(execlists);
+	cancel_port_requests(execlists);
 
 	/* Mark all executing requests as skipped. */
 	list_for_each_entry(rq, &engine->active.requests, sched.link) {
-- 
2.21.0



More information about the Intel-gfx mailing list