[Bug 82864] [HSW i915 MSI-7817] S4 resume on Haswell causes memory corruption (OOM, ext4_, ...)

Fri Nov 28 00:18:13 PST 2014

https://bugs.freedesktop.org/show_bug.cgi?id=82864

Imre Deak <imre.deak at intel.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |chris at chris-wilson.co.uk

--- Comment #18 from Imre Deak <imre.deak at intel.com> ---
(In reply to Jens from comment #17)
> I compiled 3.18.0rc6+ / linux-drm-nightly as of yesterday
> (a834a782adf3ab4b508cd80e9082960263bcc4ed) and did one pm-hibernate/resume
> cycle during "make -j4" in the kernel tree. Upon resume I get this:
> 
> [   40.501301] init: samba-ad-dc main process (1405) terminated with status 1
> [   55.521833] ------------[ cut here ]------------
> [   55.521853] WARNING: CPU: 3 PID: 1943 at
> drivers/gpu/drm/i915/i915_gem_execbuffer.c:125
> eb_lookup_vmas.isra.15+0x363/0x400 [i915]()
> [   55.521854] GPU use of dumb buffer is illegal.
> [   55.521855] Modules linked in: bnep(E) rfcomm(E) bluetooth(E)
> snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E)
> snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E)
> intel_rapl(E) snd_pcm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E)
> snd_seq_midi(E) snd_seq_midi_event(E) coretemp(E) snd_rawmidi(E) snd_seq(E)
> kvm_intel(E) snd_seq_device(E) kvm(E) snd_timer(E) snd(E) soundcore(E)
> mei_me(E) shpchp(E) mei(E) lpc_ich(E) serio_raw(E) tpm_infineon(E)
> intel_smartconnect(E) mac_hid(E) parport_pc(E) ppdev(E) lp(E) parport(E)
> dm_crypt(E) netconsole(E) configfs(E) hid_generic(E) usbhid(E) hid(E)
> mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E)
> aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E)
> ablk_helper(E) cryptd(E) i915(E) ahci(E) i2c_algo_bit(E) libahci(E)
> drm_kms_helper(E) r8169(E) mii(E) drm(E) wmi(E) video(E)
> [   55.521873] CPU: 3 PID: 1943 Comm: Xorg Tainted: G            E 
> 3.18.0-rc6+ #7
> [   55.521874] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5
> 05/30/2014
> [   55.521875]  0000000000000009 ffff8802108efb48 ffffffff81762cfc
> 0000000000000000
> [   55.521876]  ffff8802108efb98 ffff8802108efb88 ffffffff8106f0b1
> ffff8802108efc18
> [   55.521877]  ffff8802108efc38 ffff880210e73780 0000000000000001
> ffff880210e737b8
> [   55.521879] Call Trace:
> [   55.521882]  [<ffffffff81762cfc>] dump_stack+0x46/0x58
> [   55.521885]  [<ffffffff8106f0b1>] warn_slowpath_common+0x81/0xa0
> [   55.521887]  [<ffffffff8106f116>] warn_slowpath_fmt+0x46/0x50
> [   55.521896]  [<ffffffffa00e56b3>] eb_lookup_vmas.isra.15+0x363/0x400
> [i915]
> [   55.521904]  [<ffffffffa00e5c6d>]
> i915_gem_do_execbuffer.isra.22+0x51d/0xd90 [i915]
> [   55.521906]  [<ffffffff811bf12c>] ? kmem_cache_alloc_trace+0x3c/0x1f0
> [   55.521915]  [<ffffffffa00eca05>] ? i915_gem_object_get_pages+0x45/0xc0
> [i915]
> [   55.521923]  [<ffffffffa00e7601>] i915_gem_execbuffer2+0xb1/0x2c0 [i915]
> [   55.521930]  [<ffffffffa001aa54>] drm_ioctl+0x1a4/0x630 [drm]
> [   55.521933]  [<ffffffff81123f0c>] ? acct_account_cputime+0x1c/0x20
> [   55.521934]  [<ffffffff811f0520>] do_vfs_ioctl+0x2e0/0x4c0
> [   55.521937]  [<ffffffff8109e304>] ? vtime_account_user+0x54/0x60
> [   55.521938]  [<ffffffff811f0781>] SyS_ioctl+0x81/0xa0
> [   55.521940]  [<ffffffff8176b3b4>] ? int_check_syscall_exit_work+0x34/0x3d
> [   55.521942]  [<ffffffff8176b12d>] system_call_fastpath+0x16/0x1b
> [   55.521943] ---[ end trace 853866804709104b ]---
> [   55.832915] init: plymouth-upstart-bridge main process ended, respawning
> [   55.835816] init: plymouth-upstart-bridge main process (2918) terminated
> with status 1
> [   55.835831] init: plymouth-upstart-bridge main process ended, respawning
> [   58.563397] audit: type=1400 audit(1416991047.231:77): apparmor="STATUS"
> operation="profile_replace" name="/usr/lib/cups/backend/cups-pdf" pid=2981
> comm="apparmor_parser"
> [   58.563401] audit: type=1400 audit(1416991047.231:78): apparmor="STATUS"
> operation="profile_replace" name="/usr/sbin/cupsd" pid=2981
> comm="apparmor_parser"
> [   58.563595] audit: type=1400 audit(1416991047.231:79): apparmor="STATUS"
> operation="profile_replace" name="/usr/sbin/cupsd" pid=2981
> comm="apparmor_parser"
> [  815.742431] init: anacron main process (1210) killed by TERM signal
> [  819.770858] PM: Syncing filesystems ... done.
> [  820.315110] Freezing user space processes ... (elapsed 0.001 seconds)
> done.

This looks like a problem in X, trying to use an invalid GEM buffer for
rendering. Does it really happen only after S4 resume, or also during normal
booting? CC'ing Chris.

> However, no more crashes, freezes or Oopses.
> 
> Also, after a few suspend/resume cycles (twice in 12) I still have the
> problem that the network does not come up again after a resume. When it
> does, I get
> 
> [ 3846.934341] r8169 0000:02:00.0 eth0: link up
> 
> in dmesg. When it doesn't, I get
> 
> [ 6221.007206] show_signal_msg: 120 callbacks suppressed
> [ 6221.007209] Watchdog[2700]: segfault at 0 ip 00007ffe51c623e8 sp
> 00007ffe41dc7560 error 6 in libcontent.so[7ffe513e8000+11d8000]
> [ 6243.712345] Watchdog[29313]: segfault at 0 ip 00007f49e1a3d3e8 sp
> 00007f49d1ba2560 error 6 in libcontent.so[7f49e11c3000+11d8000]
> 
> but I don't know if these are related. I also occasionally get this
> 
> [ 6520.964686] Restarting tasks ... 
> [ 6520.964841] pci_bus 0000:04: Allocating resources
> [ 6520.964855] pci 0000:03:00.0: PCI bridge to [bus 04]
> [ 6520.964859] pci 0000:03:00.0:   bridge window [io  0x3000-0x3fff]
> [ 6520.964866] pci 0000:03:00.0:   bridge window [mem 0xdf600000-0xdf7fffff]
> [ 6520.964870] pci 0000:03:00.0:   bridge window [mem 0xdf800000-0xdf9fffff
> 64bit pref]
> [ 6520.968218] done.
> [ 6520.968224] video LNXVIDEO:00: Restoring backlight state
> [ 6528.107156] r8169 0000:02:00.0 eth0: link down
> [ 6528.107204] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [ 6528.107448] r8169 0000:02:00.0 eth0: link down
> [ 6531.536977] r8169 0000:02:00.0 eth0: link up
> [ 6531.536983] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [ 6543.357696] ------------[ cut here ]------------
> [ 6543.357703] WARNING: CPU: 0 PID: 20681 at net/sched/sch_generic.c:303
> dev_watchdog+0x24f/0x260()
> [ 6543.357704] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
> [ 6543.357705] Modules linked in: bnep(E) rfcomm(E) bluetooth(E)
> snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E)
> snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E)
> intel_rapl(E) snd_pcm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E)
> snd_seq_midi(E) snd_seq_midi_event(E) coretemp(E) snd_rawmidi(E) snd_seq(E)
> kvm_intel(E) snd_seq_device(E) kvm(E) snd_timer(E) snd(E) soundcore(E)
> mei_me(E) shpchp(E) mei(E) lpc_ich(E) serio_raw(E) tpm_infineon(E)
> intel_smartconnect(E) mac_hid(E) parport_pc(E) ppdev(E) lp(E) parport(E)
> dm_crypt(E) netconsole(E) configfs(E) hid_generic(E) usbhid(E) hid(E)
> mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E)
> aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E)
> ablk_helper(E) cryptd(E) i915(E) ahci(E) i2c_algo_bit(E) libahci(E)
> drm_kms_helper(E) r8169(E) mii(E) drm(E) wmi(E) video(E)
> [ 6543.357738] CPU: 0 PID: 20681 Comm: cc1 Tainted: G        W   E 
> 3.18.0-rc6+ #7
> [ 6543.357739] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5
> 05/30/2014
> [ 6543.357740]  0000000000000009 ffff88021ea03d48 ffffffff81762cfc
> 0000000000000000
> [ 6543.357741]  ffff88021ea03d98 ffff88021ea03d88 ffffffff8106f0b1
> ffff88021ea03d70
> [ 6543.357743]  0000000000000000 ffff88020fb08000 0000000000000001
> ffff8800c65e1e80
> [ 6543.357744] Call Trace:
> [ 6543.357745]  <IRQ>  [<ffffffff81762cfc>] dump_stack+0x46/0x58
> [ 6543.357751]  [<ffffffff8106f0b1>] warn_slowpath_common+0x81/0xa0
> [ 6543.357753]  [<ffffffff8106f116>] warn_slowpath_fmt+0x46/0x50
> [ 6543.357755]  [<ffffffff8168469f>] dev_watchdog+0x24f/0x260
> [ 6543.357756]  [<ffffffff81684450>] ? dev_graft_qdisc+0x80/0x80
> [ 6543.357759]  [<ffffffff810d39fa>] call_timer_fn+0x3a/0x110
> [ 6543.357760]  [<ffffffff81684450>] ? dev_graft_qdisc+0x80/0x80
> [ 6543.357762]  [<ffffffff810d519f>] run_timer_softirq+0x20f/0x310
> [ 6543.357763]  [<ffffffff810731b5>] __do_softirq+0xf5/0x2d0
> [ 6543.357764]  [<ffffffff81073685>] irq_exit+0x115/0x120
> [ 6543.357766]  [<ffffffff8176dfaa>] smp_apic_timer_interrupt+0x4a/0x60
> [ 6543.357769]  [<ffffffff8176c07d>] apic_timer_interrupt+0x6d/0x80
> [ 6543.357769]  <EOI> 
> [ 6543.357770] ---[ end trace 853866804709104c ]---
> [ 6543.375603] r8169 0000:02:00.0 eth0: link up
> 
> after which the network works again.
> 
> Is the network issue being worked on actively? If so, I can try on a second
> machine and report back.

I'm not sure, but this is a network driver problem, so could you let the
maintainers of it know about this? IIRC you opened a bug about this already.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20141128/686794b2/attachment.html>