<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_ASSIGNED "
title="ASSIGNED - [BAT] [GLK-DSI only] igt@gem_* - Failed assertion: !"GPU hung" - and its aftermath"
href="https://bugs.freedesktop.org/show_bug.cgi?id=103514#c34">Comment # 34</a>
on <a class="bz_bug_link
bz_status_ASSIGNED "
title="ASSIGNED - [BAT] [GLK-DSI only] igt@gem_* - Failed assertion: !"GPU hung" - and its aftermath"
href="https://bugs.freedesktop.org/show_bug.cgi?id=103514">bug 103514</a>
from <span class="vcard"><a class="email" href="mailto:marta.lofstedt@intel.com" title="Marta Löfstedt <marta.lofstedt@intel.com>"> <span class="fn">Marta Löfstedt</span></a>
</span></b>
<pre>It starts with:
<a href="https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/igt@gem_exec_flush@basic-uc-set-default.html">https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/igt@gem_exec_flush@basic-uc-set-default.html</a>
(gem_exec_flush:1725) igt-aux-CRITICAL: Failed assertion: !"GPU hung"
<7>[ 170.959810] [IGT] gem_exec_flush: starting subtest basic-uc-set-default
...
<7>[ 177.768550] missed_breadcrumb rcs0 missed breadcrumb at
intel_breadcrumbs_hangcheck+0x5a/0x80 [i915]
...
<6>[ 180.857713] [drm] GPU HANG: ecode 9:0:0x8fdafffa, in gem_exec_flush
[1725], reason: Hang on rcs0, action: reset
<7>[ 180.858578] [drm:i915_reset_device [i915]] resetting chip
<5>[ 180.858755] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[ 181.361556] [drm:intel_gpu_reset [i915]] rcs0: timed out on STOP_RING
<3>[ 182.065059] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request
timeout
then:
<a href="https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/igt@gem_exec_reloc@basic-cpu-read.html">https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/igt@gem_exec_reloc@basic-cpu-read.html</a>
incomplete pstore looks legit, but most of it is already in dmesg.
from dmesg:
<7>[ 187.664410] [IGT] gem_exec_reloc: exiting, ret=77
<4>[ 187.770081] WARNING: can't dereference iret registers at 00000000b08c140c
for ip page_fault+0x7/0x30
<0>[ 187.770083] BUG: stack guard page was hit at 00000000dd82e48c (stack is
00000000ca52e808..0000000085d6198d)
<4>[ 187.770178] kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP
<0>[ 187.770191] Dumping ftrace buffer:
<0>[ 187.770199] ---------------------------------
<0>[ 187.770281] CPU:3 [LOST 63587 EVENTS]
gem_exec-1729 3..s1 173324421us :
execlists_submission_tasklet: rcs0 in[0]: ctx=2.1, seqno=5d0bc
...
<4>[ 187.801374] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp
crct10dif_pclmul crc32_pclmul i915 ghash_clmulni_intel snd_hda_intel
snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers
i2c_hid pinctrl_geminilake pinctrl_intel
<4>[ 187.801505] CPU: 2 PID: 1377 Comm: python3 Tainted: G U W
4.15.0-rc3-CI-CI_DRM_3511+ #1
<4>[ 187.801531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07),
BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[ 187.801564] RIP: 0010:page_fault+0x7/0x30
<4>[ 187.801578] RSP: 0018:ffffc90001d83fa8 EFLAGS: 00010083
<4>[ 187.801597] RAX: 0000000080000000 RBX: 0000000000000000 RCX:
0000000000000000
<4>[ 187.801618] RDX: 0000000080000610 RSI: 0000000000000000 RDI:
ffffc90001d840f8
<4>[ 187.801639] RBP: 0000000080000610 R08: 0000000000000001 R09:
0101010101010101
<4>[ 187.801659] R10: ffffc90001d87a90 R11: 0000000000000000 R12:
ffffc90001d840f8
<4>[ 187.801680] R13: ffff8801733c51c0 R14: 0000000000000001 R15:
ffff8801733c51c0
<4>[ 187.801702] FS: 00007fac1674e700(0000) GS:ffff88017fd00000(0000)
knlGS:0000000000000000
<4>[ 187.801726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 187.801744] CR2: ffffc90001d83f98 CR3: 000000016f353000 CR4:
0000000000340ee0
<4>[ 187.801764] Call Trace:
<4>[ 187.801782] ? no_context+0x3dc/0x430
<4>[ 187.801800] ? __do_page_fault+0x196/0x560
...
<1>[ 187.804948] RIP: page_fault+0x7/0x30 RSP: ffffc90001d83fa8
<4>[ 187.804968] ---[ end trace 7832dee94e24beea ]---
<3>[ 188.000284] BUG: sleeping function called from invalid context at
./include/linux/percpu-rwsem.h:34
<3>[ 188.000315] in_atomic(): 1, irqs_disabled(): 1, pid: 1377, name: python3
<4>[ 188.000334] INFO: lockdep is turned off.
<4>[ 188.000347] irq event stamp: 1180122
<4>[ 188.000367] hardirqs last enabled at (1180121): [<00000000e846d9d1>]
get_page_from_freelist+0x24c/0x14c0
<4>[ 188.000395] hardirqs last disabled at (1180122): [<00000000804f94d3>]
__slab_alloc.isra.24.constprop.29+0x19/0x70
<4>[ 188.000425] softirqs last enabled at (1179892): [<000000002b075771>]
__do_softirq+0x3aa/0x4de
<4>[ 188.000451] softirqs last disabled at (1179885): [<00000000a976b967>]
irq_exit+0xaa/0xc0
<3>[ 188.000473] Preemption disabled at:
<4>[ 188.000478] [<000000005fa92adc>] ist_enter+0x1c/0xa0
<4>[ 188.000507] CPU: 2 PID: 1377 Comm: python3 Tainted: G UD W
4.15.0-rc3-CI-CI_DRM_3511+ #1
<4>[ 188.000531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07),
BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[ 188.000560] Call Trace:
<4>[ 188.000578] dump_stack+0x5f/0x86
<4>[ 188.000593] ___might_sleep+0x1d9/0x240
then continue in:
<a href="https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/pstore0-1513197126_Oops_2.log">https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/pstore0-1513197126_Oops_2.log</a>
and it is actually a Softdog:
<3>[ 188.000473] Preemption disabled at:
<4>[ 188.000478] [<000000005fa92adc>] ist_enter+0x1c/0xa0
<4>[ 188.000507] CPU: 2 PID: 1377 Comm: python3 Tainted: G UD W
4.15.0-rc3-CI-CI_DRM_3511+ #1
<4>[ 188.000531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07),
BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[ 188.000560] Call Trace:
<4>[ 188.000578] dump_stack+0x5f/0x86
<4>[ 188.000593] ___might_sleep+0x1d9/0x240
<4>[ 188.000610] exit_signals+0x1b/0x2a0
<4>[ 188.000624] do_exit+0x93/0xcc0
<4>[ 188.000638] ? trace_hardirqs_off_caller+0x75/0xd0
<4>[ 188.000654] ? do_syscall_64+0x19/0x1a0
<4>[ 188.000671] rewind_stack_do_exit+0x17/0x20
<6>[ 188.000769] note: python3[1377] exited with preempt_count 1
<12>[ 277.879085] owatch: TIMEOUT!
<12>[ 277.879252] owatch: timeout for /dev/watchdog0 set to 10 (requested 10)</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
<li>You are the QA Contact for the bug.</li>
</ul>
</body>
</html>