[Intel-gfx] [PATCH] drm/i915/execlists: Skip lite restore on the currently executing request

Wed Apr 25 10:59:23 UTC 2018

When WaIdleLiteRestore isn't enough.

Fixes an odd hang on gen8 (both bsw and bdw) during gem_ctx_switch,
where by all intents and purposes if we trigger a lite-restore as it is
processing the pipecontrol flushes, the RING is restored to the oword
following the command and tries to execute the destination address for
the pipecontrol rather than a valid command. With the theory being that
it doesn't like RING_HEAD being within a cacheline of the restored
RING_TAIL, we can evade that issue by not triggering a lite-restore if
we know we are inside the last request.

Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 029901a8fa38..5c50263e45d3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -639,6 +639,19 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		if (port_count(&port[1]))
 			goto unlock;
 
+		/*
+		 * Skip invoking a lite-restore if we know we have already
+		 * started processing the last request queued to HW. This
+		 * prevents a mystery *unrecoverable* hang on gen8, maybe
+		 * related to updating TAIL within a cacheline of HEAD? (As
+		 * there is still a delay between submitting the ESLP update
+		 * and HW responding, we may still encounter whatever condition
+		 * trips up, just less often.)
+		 */
+		if (i915_seqno_passed(intel_engine_get_seqno(engine),
+				      last->global_seqno - 1))
+			goto unlock;
+
 		/*
 		 * WaIdleLiteRestore:bdw,skl
 		 * Apply the wa NOOPs to prevent
-- 
2.17.0