[Bug 103514] [BAT] [GLK-DSI only] igt at gem_* - Failed assertion: !"GPU hung" - and its aftermath

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Dec 14 07:27:52 UTC 2017


https://bugs.freedesktop.org/show_bug.cgi?id=103514

--- Comment #34 from Marta Löfstedt <marta.lofstedt at intel.com> ---
It starts with:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/igt@gem_exec_flush@basic-uc-set-default.html
(gem_exec_flush:1725) igt-aux-CRITICAL: Failed assertion: !"GPU hung"

<7>[  170.959810] [IGT] gem_exec_flush: starting subtest basic-uc-set-default
...
<7>[  177.768550] missed_breadcrumb rcs0 missed breadcrumb at
intel_breadcrumbs_hangcheck+0x5a/0x80 [i915]
...
<6>[  180.857713] [drm] GPU HANG: ecode 9:0:0x8fdafffa, in gem_exec_flush
[1725], reason: Hang on rcs0, action: reset
<7>[  180.858578] [drm:i915_reset_device [i915]] resetting chip
<5>[  180.858755] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[  181.361556] [drm:intel_gpu_reset [i915]] rcs0: timed out on STOP_RING
<3>[  182.065059] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request
timeout

then:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/igt@gem_exec_reloc@basic-cpu-read.html
incomplete pstore looks legit, but most of it is already in dmesg.

from dmesg:
<7>[  187.664410] [IGT] gem_exec_reloc: exiting, ret=77
<4>[  187.770081] WARNING: can't dereference iret registers at 00000000b08c140c
for ip page_fault+0x7/0x30
<0>[  187.770083] BUG: stack guard page was hit at 00000000dd82e48c (stack is
00000000ca52e808..0000000085d6198d)
<4>[  187.770178] kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP
<0>[  187.770191] Dumping ftrace buffer:
<0>[  187.770199] ---------------------------------
<0>[  187.770281] CPU:3 [LOST 63587 EVENTS]
                  gem_exec-1729    3..s1 173324421us :
execlists_submission_tasklet: rcs0 in[0]:  ctx=2.1, seqno=5d0bc
...
<4>[  187.801374] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp
crct10dif_pclmul crc32_pclmul i915 ghash_clmulni_intel snd_hda_intel
snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me r8169 mii mei prime_numbers
i2c_hid pinctrl_geminilake pinctrl_intel
<4>[  187.801505] CPU: 2 PID: 1377 Comm: python3 Tainted: G     U  W       
4.15.0-rc3-CI-CI_DRM_3511+ #1
<4>[  187.801531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07),
BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[  187.801564] RIP: 0010:page_fault+0x7/0x30
<4>[  187.801578] RSP: 0018:ffffc90001d83fa8 EFLAGS: 00010083
<4>[  187.801597] RAX: 0000000080000000 RBX: 0000000000000000 RCX:
0000000000000000
<4>[  187.801618] RDX: 0000000080000610 RSI: 0000000000000000 RDI:
ffffc90001d840f8
<4>[  187.801639] RBP: 0000000080000610 R08: 0000000000000001 R09:
0101010101010101
<4>[  187.801659] R10: ffffc90001d87a90 R11: 0000000000000000 R12:
ffffc90001d840f8
<4>[  187.801680] R13: ffff8801733c51c0 R14: 0000000000000001 R15:
ffff8801733c51c0
<4>[  187.801702] FS:  00007fac1674e700(0000) GS:ffff88017fd00000(0000)
knlGS:0000000000000000
<4>[  187.801726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  187.801744] CR2: ffffc90001d83f98 CR3: 000000016f353000 CR4:
0000000000340ee0
<4>[  187.801764] Call Trace:
<4>[  187.801782]  ? no_context+0x3dc/0x430
<4>[  187.801800]  ? __do_page_fault+0x196/0x560
...
<1>[  187.804948] RIP: page_fault+0x7/0x30 RSP: ffffc90001d83fa8
<4>[  187.804968] ---[ end trace 7832dee94e24beea ]---
<3>[  188.000284] BUG: sleeping function called from invalid context at
./include/linux/percpu-rwsem.h:34
<3>[  188.000315] in_atomic(): 1, irqs_disabled(): 1, pid: 1377, name: python3
<4>[  188.000334] INFO: lockdep is turned off.
<4>[  188.000347] irq event stamp: 1180122
<4>[  188.000367] hardirqs last  enabled at (1180121): [<00000000e846d9d1>]
get_page_from_freelist+0x24c/0x14c0
<4>[  188.000395] hardirqs last disabled at (1180122): [<00000000804f94d3>]
__slab_alloc.isra.24.constprop.29+0x19/0x70
<4>[  188.000425] softirqs last  enabled at (1179892): [<000000002b075771>]
__do_softirq+0x3aa/0x4de
<4>[  188.000451] softirqs last disabled at (1179885): [<00000000a976b967>]
irq_exit+0xaa/0xc0
<3>[  188.000473] Preemption disabled at:
<4>[  188.000478] [<000000005fa92adc>] ist_enter+0x1c/0xa0
<4>[  188.000507] CPU: 2 PID: 1377 Comm: python3 Tainted: G     UD W       
4.15.0-rc3-CI-CI_DRM_3511+ #1
<4>[  188.000531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07),
BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[  188.000560] Call Trace:
<4>[  188.000578]  dump_stack+0x5f/0x86
<4>[  188.000593]  ___might_sleep+0x1d9/0x240
then continue in:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/fi-glk-dsi/pstore0-1513197126_Oops_2.log
and it is actually a Softdog:

<3>[  188.000473] Preemption disabled at:
<4>[  188.000478] [<000000005fa92adc>] ist_enter+0x1c/0xa0
<4>[  188.000507] CPU: 2 PID: 1377 Comm: python3 Tainted: G     UD W       
4.15.0-rc3-CI-CI_DRM_3511+ #1
<4>[  188.000531] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07),
BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[  188.000560] Call Trace:
<4>[  188.000578]  dump_stack+0x5f/0x86
<4>[  188.000593]  ___might_sleep+0x1d9/0x240
<4>[  188.000610]  exit_signals+0x1b/0x2a0
<4>[  188.000624]  do_exit+0x93/0xcc0
<4>[  188.000638]  ? trace_hardirqs_off_caller+0x75/0xd0
<4>[  188.000654]  ? do_syscall_64+0x19/0x1a0
<4>[  188.000671]  rewind_stack_do_exit+0x17/0x20
<6>[  188.000769] note: python3[1377] exited with preempt_count 1
<12>[  277.879085] owatch: TIMEOUT!
<12>[  277.879252] owatch: timeout for /dev/watchdog0 set to 10 (requested 10)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20171214/2862bc5c/attachment.html>


More information about the intel-gfx-bugs mailing list