[Bug 82864] [HSW i915 MSI-7817] S4 resume on Haswell causes memory corruption (OOM, ext4_, ...)

Thu Nov 27 10:06:39 PST 2014

https://bugs.freedesktop.org/show_bug.cgi?id=82864

--- Comment #17 from Jens <jens-bugs.freedesktop.org at spamfreemail.de> ---
I compiled 3.18.0rc6+ / linux-drm-nightly as of yesterday
(a834a782adf3ab4b508cd80e9082960263bcc4ed) and did one pm-hibernate/resume
cycle during "make -j4" in the kernel tree. Upon resume I get this:

[   40.501301] init: samba-ad-dc main process (1405) terminated with status 1
[   55.521833] ------------[ cut here ]------------
[   55.521853] WARNING: CPU: 3 PID: 1943 at
drivers/gpu/drm/i915/i915_gem_execbuffer.c:125
eb_lookup_vmas.isra.15+0x363/0x400 [i915]()
[   55.521854] GPU use of dumb buffer is illegal.
[   55.521855] Modules linked in: bnep(E) rfcomm(E) bluetooth(E)
snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E)
snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E)
intel_rapl(E) snd_pcm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E)
snd_seq_midi(E) snd_seq_midi_event(E) coretemp(E) snd_rawmidi(E) snd_seq(E)
kvm_intel(E) snd_seq_device(E) kvm(E) snd_timer(E) snd(E) soundcore(E)
mei_me(E) shpchp(E) mei(E) lpc_ich(E) serio_raw(E) tpm_infineon(E)
intel_smartconnect(E) mac_hid(E) parport_pc(E) ppdev(E) lp(E) parport(E)
dm_crypt(E) netconsole(E) configfs(E) hid_generic(E) usbhid(E) hid(E)
mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E)
aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E)
cryptd(E) i915(E) ahci(E) i2c_algo_bit(E) libahci(E) drm_kms_helper(E) r8169(E)
mii(E) drm(E) wmi(E) video(E)
[   55.521873] CPU: 3 PID: 1943 Comm: Xorg Tainted: G            E  3.18.0-rc6+
#7
[   55.521874] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5
05/30/2014
[   55.521875]  0000000000000009 ffff8802108efb48 ffffffff81762cfc
0000000000000000
[   55.521876]  ffff8802108efb98 ffff8802108efb88 ffffffff8106f0b1
ffff8802108efc18
[   55.521877]  ffff8802108efc38 ffff880210e73780 0000000000000001
ffff880210e737b8
[   55.521879] Call Trace:
[   55.521882]  [<ffffffff81762cfc>] dump_stack+0x46/0x58
[   55.521885]  [<ffffffff8106f0b1>] warn_slowpath_common+0x81/0xa0
[   55.521887]  [<ffffffff8106f116>] warn_slowpath_fmt+0x46/0x50
[   55.521896]  [<ffffffffa00e56b3>] eb_lookup_vmas.isra.15+0x363/0x400 [i915]
[   55.521904]  [<ffffffffa00e5c6d>] i915_gem_do_execbuffer.isra.22+0x51d/0xd90
[i915]
[   55.521906]  [<ffffffff811bf12c>] ? kmem_cache_alloc_trace+0x3c/0x1f0
[   55.521915]  [<ffffffffa00eca05>] ? i915_gem_object_get_pages+0x45/0xc0
[i915]
[   55.521923]  [<ffffffffa00e7601>] i915_gem_execbuffer2+0xb1/0x2c0 [i915]
[   55.521930]  [<ffffffffa001aa54>] drm_ioctl+0x1a4/0x630 [drm]
[   55.521933]  [<ffffffff81123f0c>] ? acct_account_cputime+0x1c/0x20
[   55.521934]  [<ffffffff811f0520>] do_vfs_ioctl+0x2e0/0x4c0
[   55.521937]  [<ffffffff8109e304>] ? vtime_account_user+0x54/0x60
[   55.521938]  [<ffffffff811f0781>] SyS_ioctl+0x81/0xa0
[   55.521940]  [<ffffffff8176b3b4>] ? int_check_syscall_exit_work+0x34/0x3d
[   55.521942]  [<ffffffff8176b12d>] system_call_fastpath+0x16/0x1b
[   55.521943] ---[ end trace 853866804709104b ]---
[   55.832915] init: plymouth-upstart-bridge main process ended, respawning
[   55.835816] init: plymouth-upstart-bridge main process (2918) terminated
with status 1
[   55.835831] init: plymouth-upstart-bridge main process ended, respawning
[   58.563397] audit: type=1400 audit(1416991047.231:77): apparmor="STATUS"
operation="profile_replace" name="/usr/lib/cups/backend/cups-pdf" pid=2981
comm="apparmor_parser"
[   58.563401] audit: type=1400 audit(1416991047.231:78): apparmor="STATUS"
operation="profile_replace" name="/usr/sbin/cupsd" pid=2981
comm="apparmor_parser"
[   58.563595] audit: type=1400 audit(1416991047.231:79): apparmor="STATUS"
operation="profile_replace" name="/usr/sbin/cupsd" pid=2981
comm="apparmor_parser"
[  815.742431] init: anacron main process (1210) killed by TERM signal
[  819.770858] PM: Syncing filesystems ... done.
[  820.315110] Freezing user space processes ... (elapsed 0.001 seconds) done.

However, no more crashes, freezes or Oopses.

Also, after a few suspend/resume cycles (twice in 12) I still have the problem
that the network does not come up again after a resume. When it does, I get

[ 3846.934341] r8169 0000:02:00.0 eth0: link up

in dmesg. When it doesn't, I get

[ 6221.007206] show_signal_msg: 120 callbacks suppressed
[ 6221.007209] Watchdog[2700]: segfault at 0 ip 00007ffe51c623e8 sp
00007ffe41dc7560 error 6 in libcontent.so[7ffe513e8000+11d8000]
[ 6243.712345] Watchdog[29313]: segfault at 0 ip 00007f49e1a3d3e8 sp
00007f49d1ba2560 error 6 in libcontent.so[7f49e11c3000+11d8000]

but I don't know if these are related. I also occasionally get this

[ 6520.964686] Restarting tasks ... 
[ 6520.964841] pci_bus 0000:04: Allocating resources
[ 6520.964855] pci 0000:03:00.0: PCI bridge to [bus 04]
[ 6520.964859] pci 0000:03:00.0:   bridge window [io  0x3000-0x3fff]
[ 6520.964866] pci 0000:03:00.0:   bridge window [mem 0xdf600000-0xdf7fffff]
[ 6520.964870] pci 0000:03:00.0:   bridge window [mem 0xdf800000-0xdf9fffff
64bit pref]
[ 6520.968218] done.
[ 6520.968224] video LNXVIDEO:00: Restoring backlight state
[ 6528.107156] r8169 0000:02:00.0 eth0: link down
[ 6528.107204] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 6528.107448] r8169 0000:02:00.0 eth0: link down
[ 6531.536977] r8169 0000:02:00.0 eth0: link up
[ 6531.536983] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 6543.357696] ------------[ cut here ]------------
[ 6543.357703] WARNING: CPU: 0 PID: 20681 at net/sched/sch_generic.c:303
dev_watchdog+0x24f/0x260()
[ 6543.357704] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[ 6543.357705] Modules linked in: bnep(E) rfcomm(E) bluetooth(E)
snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E)
snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E)
intel_rapl(E) snd_pcm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E)
snd_seq_midi(E) snd_seq_midi_event(E) coretemp(E) snd_rawmidi(E) snd_seq(E)
kvm_intel(E) snd_seq_device(E) kvm(E) snd_timer(E) snd(E) soundcore(E)
mei_me(E) shpchp(E) mei(E) lpc_ich(E) serio_raw(E) tpm_infineon(E)
intel_smartconnect(E) mac_hid(E) parport_pc(E) ppdev(E) lp(E) parport(E)
dm_crypt(E) netconsole(E) configfs(E) hid_generic(E) usbhid(E) hid(E)
mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E)
aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E)
cryptd(E) i915(E) ahci(E) i2c_algo_bit(E) libahci(E) drm_kms_helper(E) r8169(E)
mii(E) drm(E) wmi(E) video(E)
[ 6543.357738] CPU: 0 PID: 20681 Comm: cc1 Tainted: G        W   E  3.18.0-rc6+
#7
[ 6543.357739] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5
05/30/2014
[ 6543.357740]  0000000000000009 ffff88021ea03d48 ffffffff81762cfc
0000000000000000
[ 6543.357741]  ffff88021ea03d98 ffff88021ea03d88 ffffffff8106f0b1
ffff88021ea03d70
[ 6543.357743]  0000000000000000 ffff88020fb08000 0000000000000001
ffff8800c65e1e80
[ 6543.357744] Call Trace:
[ 6543.357745]  <IRQ>  [<ffffffff81762cfc>] dump_stack+0x46/0x58
[ 6543.357751]  [<ffffffff8106f0b1>] warn_slowpath_common+0x81/0xa0
[ 6543.357753]  [<ffffffff8106f116>] warn_slowpath_fmt+0x46/0x50
[ 6543.357755]  [<ffffffff8168469f>] dev_watchdog+0x24f/0x260
[ 6543.357756]  [<ffffffff81684450>] ? dev_graft_qdisc+0x80/0x80
[ 6543.357759]  [<ffffffff810d39fa>] call_timer_fn+0x3a/0x110
[ 6543.357760]  [<ffffffff81684450>] ? dev_graft_qdisc+0x80/0x80
[ 6543.357762]  [<ffffffff810d519f>] run_timer_softirq+0x20f/0x310
[ 6543.357763]  [<ffffffff810731b5>] __do_softirq+0xf5/0x2d0
[ 6543.357764]  [<ffffffff81073685>] irq_exit+0x115/0x120
[ 6543.357766]  [<ffffffff8176dfaa>] smp_apic_timer_interrupt+0x4a/0x60
[ 6543.357769]  [<ffffffff8176c07d>] apic_timer_interrupt+0x6d/0x80
[ 6543.357769]  <EOI> 
[ 6543.357770] ---[ end trace 853866804709104c ]---
[ 6543.375603] r8169 0000:02:00.0 eth0: link up

after which the network works again.

Is the network issue being worked on actively? If so, I can try on a second
machine and report back.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20141127/a32464fe/attachment.html>