[Intel-gfx] [CI] drm/i915/tgl: Magic interrupt shadow to relieve some random lockups

Chris Wilson chris at chris-wilson.co.uk
Mon Sep 30 19:49:28 UTC 2019


My current theory is that this masks interrupt delivery to the local CPU
during a critical phase. Purely papering over the symptoms with a delay
plucked out of thin air from testing on tgl1-gem, refined slightly by
just waiting for the next ack (though technically the next CS event may
not be the corresponding event for this submit, but an intermediate
completion).

Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
Cc: Andi Shyti <andi.shyti at intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index fa385218ce92..6b995017571d 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1186,6 +1186,28 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 	/* we need to manually load the submit queue */
 	if (execlists->ctrl_reg)
 		writel(EL_CTRL_LOAD, execlists->ctrl_reg);
+
+	/*
+	 * Now this is evil magic.
+	 *
+	 * Adding the same udelay() to process_csb before we clear
+	 * execlists->pending (that is after we receive the HW ack for this
+	 * submit and before we can submit again) does not relieve the symptoms
+	 * (machine lockup). So is the active difference here the wait under
+	 * the irq-off spinlock? That gives more credance to the theory that
+	 * the issue is interrupt delivery. Also note that we still rely on
+	 * disabling RPS, again that seems like an issue with simultaneous
+	 * GT interrupts being delivered to the same CPU.
+	 */
+	if (IS_TIGERLAKE(engine->i915)) {
+		u64 start = local_clock();
+
+		udelay(250);
+
+		while (READ_ONCE(*execlists->pending) && /* wait for ack */
+		       (local_clock() - start) >> 22 == 0) /* ~4ms timeout */
+			cpu_relax();
+	}
 }
 
 static bool ctx_single_port_submission(const struct intel_context *ce)
-- 
2.23.0



More information about the Intel-gfx mailing list