[Intel-gfx] [CI] drm/i915/tgl: Magic interrupt shadow to relieve some random lockups
Chris Wilson
chris at chris-wilson.co.uk
Mon Sep 30 19:49:28 UTC 2019
My current theory is that this masks interrupt delivery to the local CPU
during a critical phase. Purely papering over the symptoms with a delay
plucked out of thin air from testing on tgl1-gem, refined slightly by
just waiting for the next ack (though technically the next CS event may
not be the corresponding event for this submit, but an intermediate
completion).
Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
Cc: Andi Shyti <andi.shyti at intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
---
drivers/gpu/drm/i915/gt/intel_lrc.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index fa385218ce92..6b995017571d 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1186,6 +1186,28 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
/* we need to manually load the submit queue */
if (execlists->ctrl_reg)
writel(EL_CTRL_LOAD, execlists->ctrl_reg);
+
+ /*
+ * Now this is evil magic.
+ *
+ * Adding the same udelay() to process_csb before we clear
+ * execlists->pending (that is after we receive the HW ack for this
+ * submit and before we can submit again) does not relieve the symptoms
+ * (machine lockup). So is the active difference here the wait under
+ * the irq-off spinlock? That gives more credance to the theory that
+ * the issue is interrupt delivery. Also note that we still rely on
+ * disabling RPS, again that seems like an issue with simultaneous
+ * GT interrupts being delivered to the same CPU.
+ */
+ if (IS_TIGERLAKE(engine->i915)) {
+ u64 start = local_clock();
+
+ udelay(250);
+
+ while (READ_ONCE(*execlists->pending) && /* wait for ack */
+ (local_clock() - start) >> 22 == 0) /* ~4ms timeout */
+ cpu_relax();
+ }
}
static bool ctx_single_port_submission(const struct intel_context *ce)
--
2.23.0
More information about the Intel-gfx
mailing list