[Bug 111937] [CI][BAT] igt at i915_selftest@live_execlists - incomplete - GEM_BUG_ON(i915_active_is_idle(&ce->active))

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Oct 9 14:32:04 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=111937

--- Comment #2 from Chris Wilson <chris at chris-wilson.co.uk> ---
This is bizarre; it looks quite straightforward but then unravels quickly as
you start pulling on threads.

<7> [696.700847] __intel_gt_set_wedged vcs0
<7> [696.700851] __intel_gt_set_wedged  Awake? 1
<7> [696.700854] __intel_gt_set_wedged  Hangcheck: 5864 ms ago
<7> [696.700856] __intel_gt_set_wedged  Reset count: 0 (global 0)
<7> [696.700859] __intel_gt_set_wedged  Requests:
<7> [696.702510] __intel_gt_set_wedged  MMIO base:  0x001c0000
<7> [696.703362] __intel_gt_set_wedged  RING_START: 0x0000a000
<7> [696.704157] __intel_gt_set_wedged  RING_HEAD:  0x00002038
<7> [696.704184] __intel_gt_set_wedged  RING_TAIL:  0x00002038
<7> [696.704223] __intel_gt_set_wedged  RING_CTL:   0x00003401 [waiting]
<7> [696.705925] __intel_gt_set_wedged  RING_MODE:  0x00000200 [idle]
<7> [696.706787] __intel_gt_set_wedged  RING_IMR: 00000000
<7> [696.709285] __intel_gt_set_wedged  ACTHD:  0x00000000_00002038
<7> [696.711045] __intel_gt_set_wedged  BBADDR: 0x00000000_00000000
<7> [696.711915] __intel_gt_set_wedged  DMA_FADDR: 0x00000000_0000c038
<7> [696.712812] __intel_gt_set_wedged  IPEIR: 0x00000000
<7> [696.713579] __intel_gt_set_wedged  IPEHR: 0x0e40c002
<7> [696.714443] __intel_gt_set_wedged  Execlist status: 0x00002098 20000040,
entries 12
<7> [696.714446] __intel_gt_set_wedged  Execlist CSB read 6, write 7, tasklet
queued? no (enabled)
<7> [696.714449] __intel_gt_set_wedged  Execlist CSB[7]: 0x00000002, context:
536870944
<7> [696.714472] __intel_gt_set_wedged          Active[0]:
ring:{start:00006000, hwsp:ffff9140, seqno:00000001}, rq:  1b146:2*  prio=3 @
8240ms: [i915]
<7> [696.714487] __intel_gt_set_wedged          Pending[0]
ring:{start:0000a000, hwsp:ffff9180, seqno:00000002}, rq:  1b147:2!+  prio=4097
@ 8240ms: signaled
<7> [696.714492] __intel_gt_set_wedged          Pending[1]
ring:{start:00006000, hwsp:ffff9140, seqno:00000001}, rq:  1b146:4-  prio=3 @
8240ms: [i915]
<7> [696.714509] __intel_gt_set_wedged          E  1b146:2*  prio=3 @ 8240ms:
[i915]
<7> [696.714512] __intel_gt_set_wedged          E  1b146:4-  prio=3 @ 8240ms:
[i915]
<7> [696.714515] __intel_gt_set_wedged          Queue priority hint: 3

<0> [696.673408] i915_sel-5787    5.... 740573711us : __intel_context_do_pin:
vcs0 context:1b146 pin ring:{head:0000, tail:0000}
<0> [696.673408] i915_sel-5787    5.... 740574064us : __intel_context_do_pin:
vcs0 context:1b147 pin ring:{head:0000, tail:0000}
<0> [696.673408] i915_sel-5787    5.... 740574078us : __engine_unpark: vcs0
<0> [696.673408] i915_sel-5787    5.... 740574084us : __gt_unpark: 
<0> [696.673408] i915_sel-5787    5.... 740574655us : __i915_request_commit:
vcs0 fence 1b146:2
<0> [696.673408] i915_sel-5787    5d..1 740574662us : __i915_request_submit:
vcs0 fence 1b146:2, current 0
<0> [696.673408] i915_sel-5787    5d..1 740574663us :
__execlists_submission_tasklet: vcs0: queue_priority_hint:-2147483648,
submit:yes
<0> [696.673408] i915_sel-5787    5d..1 740574665us : trace_ports: vcs0: submit
{ 1b146:2, 0:0 }
<0> [696.673408] i915_sel-5787    5.... 740574723us : __i915_request_commit:
vcs0 fence 1b147:2
<0> [696.673408] i915_sel-5787    5.... 740574754us : __i915_request_commit:
vcs0 fence 1b146:4
<0> [696.673408]   <idle>-0       2..s1 740574757us : process_csb: vcs0 cs-irq
head=5, tail=6
<0> [696.673408]   <idle>-0       2..s1 740574758us : process_csb: vcs0 csb[6]:
status=0x00000001:0x20000000
<0> [696.673408]   <idle>-0       2..s1 740574760us : trace_ports: vcs0:
promote { 1b146:2*, 0:0 }
<0> [696.673408]   <idle>-0       2d.s2 740574784us :
__execlists_submission_tasklet: vcs0: preempting last=1b146:2, prio=3,
hint=4097
<0> [696.673408]   <idle>-0       2d.s2 740574786us : __i915_request_unsubmit:
vcs0 fence 1b146:2, current 1
<0> [696.673408]   <idle>-0       2d.s2 740574788us : __i915_request_submit:
vcs0 fence 1b147:2, current 0
<0> [696.673408]   <idle>-0       2d.s2 740574798us : __i915_request_submit:
vcs0 fence 1b146:2, current 1
<0> [696.673408]   <idle>-0       2d.s2 740574800us : __i915_request_submit:
vcs0 fence 1b146:4, current 1
<0> [696.673408]   <idle>-0       2d.s2 740574801us :
__execlists_submission_tasklet: vcs0: queue_priority_hint:-2147483648,
submit:yes
<0> [696.673408]   <idle>-0       2d.s2 740574802us : trace_ports: vcs0: submit
{ 1b147:2, 1b146:4 }
<0> [696.673408] i915_sel-5787    5.... 740574910us : i915_request_retire: vcs0
fence 1b147:2, current 2
<0> [696.673408] i915_sel-5787    5.... 740574912us : intel_context_unpin: vcs0
context:1b147 retire
<0> [696.673408] i915_sel-5787    5.... 740574916us : __intel_context_retire:
vcs0 context:1b147 retire

So the HW froze, there is a CS event in the queue, but we never saw the
interrupt. (So the HW died? We just missed an interrupt? The latter is nice and
scary.)

During reset, the context idled. Which also shouldn't have happened -- I think
the engine parked, but we are after the set-wedged (and after the GEM_TRACE) so
it is retired immediately. Hmm. Seems possible.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20191009/eda81c21/attachment-0001.html>


More information about the intel-gfx-bugs mailing list