[Intel-gfx] [PATCH] drm/i915/tgl: Magic udelay to relieve the random lockups with multiple engines
Chris Wilson
chris at chris-wilson.co.uk
Sat Sep 28 10:01:45 UTC 2019
My current theory is that masks interrupt delivery to the local CPU
during a critical phase. Purely papering over the symptoms with a delay
plucked out of thin air from testing on tgl1-gem.
Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
Cc: Andi Shyti <andi.shyti at intel.com>
---
drivers/gpu/drm/i915/gt/intel_lrc.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index fa385218ce92..fe8f4625f04f 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1186,6 +1186,21 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
/* we need to manually load the submit queue */
if (execlists->ctrl_reg)
writel(EL_CTRL_LOAD, execlists->ctrl_reg);
+
+ /*
+ * Now this is evil magic.
+ *
+ * Adding the same udelay() to process_csb before we clear
+ * execlists->pending (that is after we receive the HW ack for this
+ * submit and before we can submit again) does not relieve the symptoms
+ * (machine lockup). So is the active difference here the wait under
+ * the irq-off spinlock? That gives more credance to the theory that
+ * the issue is interrupt delivery. Also note that we still rely on
+ * disabling RPS, again that seems like an issue with simultaneous
+ * GT interrupts being delivered to the same CPU.
+ */
+ if (IS_TIGERLAKE(engine->i915))
+ udelay(250);
}
static bool ctx_single_port_submission(const struct intel_context *ce)
--
2.23.0
More information about the Intel-gfx
mailing list