[Bug 105900] [CI] igt at gem_exec_* - fail - Failed assertion: !"GPU hung"

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu May 3 14:35:50 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=105900

--- Comment #7 from Chris Wilson <chris at chris-wilson.co.uk> ---
(In reply to Chris Wilson from comment #6)
> (In reply to Chris Wilson from comment #5)
> > (In reply to Martin Peres from comment #4)
> > > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_29/fi-cnl-y3/
> > > igt at gem_exec_await@wide-contexts.html
> > > 
> > > (gem_exec_await:2291) igt_aux-CRITICAL: Test assertion failure function
> > > sig_abort, file ../lib/igt_aux.c:481:
> > > (gem_exec_await:2291) igt_aux-CRITICAL: Failed assertion: !"GPU hung"
> > > Subtest wide-contexts failed.
> > 
> > This is a different issue. The GPU hang here is a result of hitting a
> > blocking ioctl in the test.
> 
> Using Execlists submission
> Ring size: 143 batches
> 
> If we can only fit 143 batches in a ring, why did we submit 144?...

Nah, last batch has seqno 144. Just an off-by-one (or at least misleading)
comment about the number of skipped batches. The ring that stuck was:

<7>[  134.752727] hangcheck vecs0
<7>[  134.752730] hangcheck     current seqno c707, last c740, hangcheck c707
[4031 ms]
<7>[  134.752733] hangcheck     Reset count: 0 (global 0)
<7>[  134.752736] hangcheck     Requests:
<7>[  134.752740] hangcheck             first  c708 [4e17:1] prio=0 @ 4036ms:
gem_exec_await[2291]/4
<7>[  134.752744] hangcheck             last   c740 [4e17:39] prio=0 @ 4033ms:
gem_exec_await[2291]/4
<7>[  134.752748] hangcheck             active c708 [4e17:1] prio=0 @ 4036ms:
gem_exec_await[2291]/4
<7>[  134.752752] hangcheck             [head 0000, postfix 0030, tail 0050,
batch 0x00000000_00040000]
<7>[  134.752755] hangcheck             ring->start:  0x035ac000
<7>[  134.752758] hangcheck             ring->head:   0x00000000
<7>[  134.752761] hangcheck             ring->tail:   0x000011c8
<7>[  134.752764] hangcheck             ring->emit:   0x000011d0
<7>[  134.752767] hangcheck             ring->space:  0x00002df0
<7>[  134.752772] hangcheck     RING_START: 0x035ac000
<7>[  134.752776] hangcheck     RING_HEAD:  0x00000020
<7>[  134.752780] hangcheck     RING_TAIL:  0x000011c8
<7>[  134.752786] hangcheck     RING_CTL:   0x00003001
<7>[  134.752791] hangcheck     RING_MODE:  0x00000000
<7>[  134.752795] hangcheck     RING_IMR: fffffeff
<7>[  134.752802] hangcheck     ACTHD:  0x00000000_00040000
<7>[  134.752809] hangcheck     BBADDR: 0x00000000_00040001
<7>[  134.752816] hangcheck     DMA_FADDR: 0x00000000_00040200
<7>[  134.752821] hangcheck     IPEIR: 0x00000000
<7>[  134.752825] hangcheck     IPEHR: 0x18800101
<7>[  134.752830] hangcheck     Execlist status: 0x00044052 0000057f
<7>[  134.752835] hangcheck     Execlist CSB read 1 [1 cached], write 1 [1 from
hws], interrupt posted? no, tasklet queued? no (enabled)
<7>[  134.752840] hangcheck             ELSP[0] count=1, rq: c740 [4e17:39]
prio=0 @ 4033ms: gem_exec_await[2291]/4
<7>[  134.752843] hangcheck             ELSP[1] idle
<7>[  134.752846] hangcheck             HW active? 0x5
<7>[  134.752850] hangcheck             E c708 [4e17:1] prio=0 @ 4036ms:
gem_exec_await[2291]/4
<7>[  134.752853] hangcheck             E c709 [4e17:2] prio=0 @ 4036ms:
gem_exec_await[2291]/4
<7>[  134.752857] hangcheck             E c70a [4e17:3] prio=0 @ 4036ms:
gem_exec_await[2291]/4
<7>[  134.752861] hangcheck             E c70b [4e17:4] prio=0 @ 4036ms:
gem_exec_await[2291]/4
<7>[  134.752865] hangcheck             E c70c [4e17:5] prio=0 @ 4036ms:
gem_exec_await[2291]/4
<7>[  134.752868] hangcheck             E c70d [4e17:6] prio=0 @ 4036ms:
gem_exec_await[2291]/4
<7>[  134.752872] hangcheck             E c70e [4e17:7] prio=0 @ 4036ms:
gem_exec_await[2291]/4
<7>[  134.752880] hangcheck             ...skipping 49 executing requests...
<7>[  134.752884] hangcheck             E c740 [4e17:39] prio=0 @ 4033ms:
gem_exec_await[2291]/4
<7>[  134.752887] hangcheck             Queue priority: -2147483648
<7>[  134.752890] hangcheck IRQ? 0x1 (breadcrumbs? yes) (execlists? no)
<7>[  134.752893] hangcheck HWSP:
<7>[  134.752897] hangcheck 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
<7>[  134.752900] hangcheck *
<7>[  134.752904] hangcheck 00000040 00008002 0000057f 00008002 0000057f
00008002 0000057f 00008002 0000057f
<7>[  134.752909] hangcheck 00000060 00008002 0000057f 00008002 0000057f
00000000 00000000 00000000 00000000
<7>[  134.752913] hangcheck 00000080 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
<7>[  134.752917] hangcheck 000000a0 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000001
<7>[  134.752922] hangcheck 000000c0 0000c707 00000000 00000000 00000000
00000000 00000000 00000000 00000000
<7>[  134.752926] hangcheck 000000e0 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
<7>[  134.752929] hangcheck *
<7>[  134.752932] hangcheck Idle? no

which isn't out of ring space... Oh, maybe it is just the premature hangcheck,
but how? Hmm.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20180503/4473faf8/attachment.html>


More information about the intel-gfx-bugs mailing list