[Bug 109606] [CI][DRMTIP] igt at pm_rps@reset - dmesg-fail - Failed assertion: __gem_execbuf_wr(fd, execbuf) == 0

Mon Feb 11 13:12:32 UTC 2019

https://bugs.freedesktop.org/show_bug.cgi?id=109606

--- Comment #2 from Chris Wilson <chris at chris-wilson.co.uk> ---
<7>[  104.332734] [IGT] pm_rps: starting subtest reset
<5>[  104.333284] Setting dangerous option reset - tainting kernel
<6>[  105.594685] i915 0000:00:02.0: GPU HANG: ecode 11:0:0x00000000, Manually
set wedged engine mask = ffffffffffffffff
<6>[  105.594790] [drm] GPU hangs can indicate a bug anywhere in the entire gfx
stack, including userspace.
<6>[  105.594793] [drm] Please file a _new_ bug report on bugs.freedesktop.org
against DRI -> DRM/Intel
<6>[  105.594795] [drm] drm/i915 developers can then reassign to the right
component if it's not a kernel issue.
<6>[  105.594797] [drm] The gpu crash dump is required to analyze gpu hangs, so
please always attach it.
<6>[  105.594800] [drm] GPU crash dump saved to /sys/class/drm/card0/error
<5>[  105.594849] i915 0000:00:02.0: Resetting rcs0 for Manually set wedged
engine mask = ffffffffffffffff
<5>[  105.596370] i915 0000:00:02.0: Resetting bcs0 for Manually set wedged
engine mask = ffffffffffffffff
<5>[  105.596494] i915 0000:00:02.0: Resetting vcs0 for Manually set wedged
engine mask = ffffffffffffffff
<5>[  105.596627] i915 0000:00:02.0: Resetting vcs2 for Manually set wedged
engine mask = ffffffffffffffff
<5>[  105.596756] i915 0000:00:02.0: Resetting vecs0 for Manually set wedged
engine mask = ffffffffffffffff
<7>[  107.207564] [drm:edp_panel_vdd_off_sync [i915]] Turning eDP port A VDD
off
<7>[  107.207783] [drm:edp_panel_vdd_off_sync [i915]] PP_STATUS: 0x80000008
PP_CONTROL: 0x00000067
<7>[  120.391709] hangcheck rcs0
<7>[  120.391740] hangcheck     current seqno 9eb, last a1d, hangcheck 9eb
[14016 ms]
<7>[  120.391745] hangcheck     Reset count: 1 (global 0)
<7>[  120.391751] hangcheck     Requests:
<7>[  120.391773] hangcheck             first  a0c [27:1402] prio=2 @ 14797ms:
pm_rps[1244]/0
<7>[  120.391781] hangcheck             last   a1d+ [27:1424] prio=1 @ 13792ms:
pm_rps[1244]/0
<7>[  120.391806] hangcheck     RING_START: 0x0000b000
<7>[  120.391813] hangcheck     RING_HEAD:  0x000000c8
<7>[  120.391820] hangcheck     RING_TAIL:  0x00001b10
<7>[  120.391829] hangcheck     RING_CTL:   0x00003001
<7>[  120.391838] hangcheck     RING_MODE:  0x00000000
<7>[  120.391844] hangcheck     RING_IMR: 00000000
<7>[  120.391855] hangcheck     ACTHD:  0x00000005_443a9d90
<7>[  120.391866] hangcheck     BBADDR: 0x00000005_443aec41
<7>[  120.391878] hangcheck     DMA_FADDR: 0x00000005_443b3980
<7>[  120.391884] hangcheck     IPEIR: 0x00000000
<7>[  120.391891] hangcheck     IPEHR: 0x18800101
<7>[  120.391900] hangcheck     Execlist status: 0x00202098 00000040
<7>[  120.391908] hangcheck     Execlist CSB read 5, write 5 [mmio:5], tasklet
queued? no (enabled)
<7>[  120.391918] hangcheck             ELSP[0] count=1, ring:{start:0000b000,
hwsp:fffee280}, rq: a1d+ [27:1424] prio=1 @ 13792ms: pm_rps[1244]/0
<7>[  120.391923] hangcheck             ELSP[1] idle
<7>[  120.391927] hangcheck             HW active? 0x5
<7>[  120.391983] hangcheck             E a0c [27:1402] prio=2 @ 14797ms:
pm_rps[1244]/0
<7>[  120.392047] hangcheck             E a0d [27:1404] prio=1 @ 13793ms:
pm_rps[1244]/0
<7>[  120.392054] hangcheck             E a0e [27:1406] prio=1 @ 13792ms:
pm_rps[1244]/0
<7>[  120.392061] hangcheck             E a0f [27:1408] prio=1 @ 13792ms:
pm_rps[1244]/0
<7>[  120.392068] hangcheck             E a10 [27:140a] prio=1 @ 13792ms:
pm_rps[1244]/0
<7>[  120.392074] hangcheck             E a11 [27:140c] prio=1 @ 13792ms:
pm_rps[1244]/0
<7>[  120.392081] hangcheck             E a12 [27:140e] prio=1 @ 13792ms:
pm_rps[1244]/0
<7>[  120.392087] hangcheck             ...skipping 10 executing requests...
<7>[  120.392094] hangcheck             E a1d+ [27:1424] prio=1 @ 13792ms:
pm_rps[1244]/0
<7>[  120.392098] hangcheck             Queue priority hint: 1
<7>[  120.392102] hangcheck HWSP:
<7>[  120.392111] hangcheck [0000] 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
<7>[  120.392115] hangcheck *
<7>[  120.392123] hangcheck [0040] 10008002 00000040 10008002 00000040 10008002
00000040 10008002 00000040
<7>[  120.392130] hangcheck [0060] 10008002 00000040 10008002 00000040 00000000
00000000 00000000 00000000
<7>[  120.392137] hangcheck [0080] 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
<7>[  120.392144] hangcheck [00a0] 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000005
<7>[  120.392151] hangcheck [00c0] 000009eb 00000000 00000000 00000000 00000000
00000000 00000000 00000000
<7>[  120.392158] hangcheck [00e0] 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
<7>[  120.392162] hangcheck *
<7>[  120.392167] hangcheck Idle? no
<7>[  120.392171] hangcheck Signals:
<7>[  120.392200] hangcheck     [27:1424] @ 13792ms
<5>[  120.392420] i915 0000:00:02.0: Resetting rcs0 for no progress on rcs0

is peculiar as our writes into the global HWSP simply vanish, and we quite
rightly conclude that we are unable to recover. That error seems related to
#109605

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the QA Contact for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20190211/83d2a4c9/attachment.html>