[Intel-gfx] [PATCH 01/17] drm/i915/execlists: Always clear pending&inflight requests on reset
Chris Wilson
chris at chris-wilson.co.uk
Tue Jul 30 13:30:19 UTC 2019
If we skip the reset as we found the engine inactive at the time of the
reset, we still need to clear the residual inflight & pending request
bookkeeping to reflect the current state of HW.
Otherwise, we may end up stuck in a loop like:
<7> [416.490346] hangcheck rcs0
<7> [416.490371] hangcheck Awake? 1
<7> [416.490376] hangcheck Hangcheck: 8003 ms ago
<7> [416.490380] hangcheck Reset count: 0 (global 0)
<7> [416.490383] hangcheck Requests:
<7> [416.491210] hangcheck RING_START: 0x0017b000
<7> [416.491983] hangcheck RING_HEAD: 0x00000048
<7> [416.491992] hangcheck RING_TAIL: 0x00000048
<7> [416.492006] hangcheck RING_CTL: 0x00000000
<7> [416.492037] hangcheck RING_MODE: 0x00000200 [idle]
<7> [416.492044] hangcheck RING_IMR: 00000000
<7> [416.492809] hangcheck ACTHD: 0x00000000_9ca00048
<7> [416.492824] hangcheck BBADDR: 0x00000000_00001004
<7> [416.492838] hangcheck DMA_FADDR: 0x00000000_00000000
<7> [416.492845] hangcheck IPEIR: 0x00000000
<7> [416.492852] hangcheck IPEHR: 0x00000000
<7> [416.492863] hangcheck Execlist status: 0x00018001 00000000, entries 12
<7> [416.492869] hangcheck Execlist CSB read 1, write 1, tasklet queued? no (enabled)
<7> [416.492938] hangcheck Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq: 20ffa:16fd6!+ prio=-4094 @ 8307ms: signaled
<7> [416.492972] hangcheck Queue priority hint: -4093
<7> [416.492979] hangcheck Q 20ffa:16fd8- prio=-4093 @ 8307ms: [i915]
<7> [416.492985] hangcheck Q 20ffa:16fda prio=-4094 @ 8307ms: [i915]
<7> [416.492990] hangcheck Q 20ffa:16fdc prio=-4094 @ 8307ms: [i915]
<7> [416.492996] hangcheck Q 20ffa:16fde prio=-4094 @ 8307ms: [i915]
<7> [416.493001] hangcheck Q 20ffa:16fe0 prio=-4094 @ 8307ms: [i915]
<7> [416.493007] hangcheck Q 20ffa:16fe2 prio=-4094 @ 8307ms: [i915]
<7> [416.493013] hangcheck Q 20ffa:16fe4 prio=-4094 @ 8307ms: [i915]
<7> [416.493021] hangcheck ...skipping 21 queued requests...
<7> [416.493027] hangcheck Q 20ffa:17010 prio=-4094 @ 8307ms: [i915]
<7> [416.493081] hangcheck HWSP:
<7> [416.493089] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [416.493094] hangcheck *
<7> [416.493100] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000
<7> [416.493106] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000
<7> [416.493111] hangcheck *
<7> [416.493117] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001
<7> [416.493123] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [416.493127] hangcheck *
<7> [416.493132] hangcheck Idle? no
<6> [416.512124] i915 0000:00:02.0: GPU HANG: ecode 11:0:0x00000000, hang on rcs0
<6> [416.512205] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
<6> [416.512207] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
<6> [416.512208] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
<6> [416.512210] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
<6> [416.512212] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<5> [416.513602] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
<7> [424.489258] hangcheck rcs0
<7> [424.489263] hangcheck Awake? 1
<7> [424.489267] hangcheck Hangcheck: 5954 ms ago
<7> [424.489271] hangcheck Reset count: 1 (global 0)
<7> [424.489274] hangcheck Requests:
<7> [424.490128] hangcheck RING_START: 0x00000000
<7> [424.490870] hangcheck RING_HEAD: 0x00000000
<7> [424.490877] hangcheck RING_TAIL: 0x00000000
<7> [424.490887] hangcheck RING_CTL: 0x00000000
<7> [424.490897] hangcheck RING_MODE: 0x00000200 [idle]
<7> [424.490904] hangcheck RING_IMR: 00000000
<7> [424.490917] hangcheck ACTHD: 0x00000000_00000000
<7> [424.490930] hangcheck BBADDR: 0x00000000_00000000
<7> [424.490943] hangcheck DMA_FADDR: 0x00000000_00000000
<7> [424.490950] hangcheck IPEIR: 0x00000000
<7> [424.490956] hangcheck IPEHR: 0x00000000
<7> [424.490968] hangcheck Execlist status: 0x00000001 00000000, entries 12
<7> [424.490972] hangcheck Execlist CSB read 11, write 11, tasklet queued? no (enabled)
<7> [424.490983] hangcheck Pending[0] ring:{start:0017b000, hwsp:fedf9000, seqno:00016fd6}, rq: 20ffa:16fd6!+ prio=-4094 @ 16305ms: signaled
<7> [424.490989] hangcheck Queue priority hint: -4093
<7> [424.490996] hangcheck Q 20ffa:16fd8- prio=-4093 @ 16305ms: [i915]
<7> [424.491001] hangcheck Q 20ffa:16fda prio=-4094 @ 16305ms: [i915]
<7> [424.491006] hangcheck Q 20ffa:16fdc prio=-4094 @ 16305ms: [i915]
<7> [424.491011] hangcheck Q 20ffa:16fde prio=-4094 @ 16305ms: [i915]
<7> [424.491016] hangcheck Q 20ffa:16fe0 prio=-4094 @ 16305ms: [i915]
<7> [424.491022] hangcheck Q 20ffa:16fe2 prio=-4094 @ 16305ms: [i915]
<7> [424.491048] hangcheck Q 20ffa:16fe4 prio=-4094 @ 16305ms: [i915]
<7> [424.491057] hangcheck ...skipping 21 queued requests...
<7> [424.491063] hangcheck Q 20ffa:17010 prio=-4094 @ 16305ms: [i915]
<7> [424.491095] hangcheck HWSP:
<7> [424.491102] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [424.491106] hangcheck *
<7> [424.491113] hangcheck [0040] 10008002 00000000 10000018 00000000 10000018 00000000 10000001 00000000
<7> [424.491118] hangcheck [0060] 10000018 00000000 10000001 00000000 10000018 00000000 10000001 00000000
<7> [424.491122] hangcheck *
<7> [424.491127] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000b
<7> [424.491133] hangcheck [00c0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [424.491136] hangcheck *
<7> [424.491141] hangcheck Idle? no
<5> [424.491834] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Where not having cleared the pending array on reset, it persists
indefinitely.
Fixes: fff8102aaed5 ("drm/i915/execlists: Process interrupted context on reset")
Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
---
drivers/gpu/drm/i915/gt/intel_lrc.c | 13 +------------
1 file changed, 1 insertion(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 4d7c4d0dbf75..86dd1eddceac 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2283,18 +2283,6 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
GEM_BUG_ON(i915_active_is_idle(&ce->active));
GEM_BUG_ON(!i915_vma_is_pinned(ce->state));
rq = active_request(rq);
-
- /*
- * Catch up with any missed context-switch interrupts.
- *
- * Ideally we would just read the remaining CSB entries now that we
- * know the gpu is idle. However, the CSB registers are sometimes^W
- * often trashed across a GPU reset! Instead we have to rely on
- * guessing the missed context-switch events by looking at what
- * requests were completed.
- */
- execlists_cancel_port_requests(execlists);
-
if (!rq) {
ce->ring->head = ce->ring->tail;
goto out_replay;
@@ -2356,6 +2344,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
unwind:
/* Push back any incomplete requests for replay after the reset. */
+ execlists_cancel_port_requests(execlists);
__unwind_incomplete_requests(engine);
}
--
2.22.0
More information about the Intel-gfx
mailing list