[Bug 82864] [HSW i915 MSI-7817] S4 resume on Haswell causes memory corruption (OOM, ext4_, ...)
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Fri Nov 28 00:18:13 PST 2014
https://bugs.freedesktop.org/show_bug.cgi?id=82864
Imre Deak <imre.deak at intel.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |chris at chris-wilson.co.uk
--- Comment #18 from Imre Deak <imre.deak at intel.com> ---
(In reply to Jens from comment #17)
> I compiled 3.18.0rc6+ / linux-drm-nightly as of yesterday
> (a834a782adf3ab4b508cd80e9082960263bcc4ed) and did one pm-hibernate/resume
> cycle during "make -j4" in the kernel tree. Upon resume I get this:
>
> [ 40.501301] init: samba-ad-dc main process (1405) terminated with status 1
> [ 55.521833] ------------[ cut here ]------------
> [ 55.521853] WARNING: CPU: 3 PID: 1943 at
> drivers/gpu/drm/i915/i915_gem_execbuffer.c:125
> eb_lookup_vmas.isra.15+0x363/0x400 [i915]()
> [ 55.521854] GPU use of dumb buffer is illegal.
> [ 55.521855] Modules linked in: bnep(E) rfcomm(E) bluetooth(E)
> snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E)
> snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E)
> intel_rapl(E) snd_pcm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E)
> snd_seq_midi(E) snd_seq_midi_event(E) coretemp(E) snd_rawmidi(E) snd_seq(E)
> kvm_intel(E) snd_seq_device(E) kvm(E) snd_timer(E) snd(E) soundcore(E)
> mei_me(E) shpchp(E) mei(E) lpc_ich(E) serio_raw(E) tpm_infineon(E)
> intel_smartconnect(E) mac_hid(E) parport_pc(E) ppdev(E) lp(E) parport(E)
> dm_crypt(E) netconsole(E) configfs(E) hid_generic(E) usbhid(E) hid(E)
> mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E)
> aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E)
> ablk_helper(E) cryptd(E) i915(E) ahci(E) i2c_algo_bit(E) libahci(E)
> drm_kms_helper(E) r8169(E) mii(E) drm(E) wmi(E) video(E)
> [ 55.521873] CPU: 3 PID: 1943 Comm: Xorg Tainted: G E
> 3.18.0-rc6+ #7
> [ 55.521874] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5
> 05/30/2014
> [ 55.521875] 0000000000000009 ffff8802108efb48 ffffffff81762cfc
> 0000000000000000
> [ 55.521876] ffff8802108efb98 ffff8802108efb88 ffffffff8106f0b1
> ffff8802108efc18
> [ 55.521877] ffff8802108efc38 ffff880210e73780 0000000000000001
> ffff880210e737b8
> [ 55.521879] Call Trace:
> [ 55.521882] [<ffffffff81762cfc>] dump_stack+0x46/0x58
> [ 55.521885] [<ffffffff8106f0b1>] warn_slowpath_common+0x81/0xa0
> [ 55.521887] [<ffffffff8106f116>] warn_slowpath_fmt+0x46/0x50
> [ 55.521896] [<ffffffffa00e56b3>] eb_lookup_vmas.isra.15+0x363/0x400
> [i915]
> [ 55.521904] [<ffffffffa00e5c6d>]
> i915_gem_do_execbuffer.isra.22+0x51d/0xd90 [i915]
> [ 55.521906] [<ffffffff811bf12c>] ? kmem_cache_alloc_trace+0x3c/0x1f0
> [ 55.521915] [<ffffffffa00eca05>] ? i915_gem_object_get_pages+0x45/0xc0
> [i915]
> [ 55.521923] [<ffffffffa00e7601>] i915_gem_execbuffer2+0xb1/0x2c0 [i915]
> [ 55.521930] [<ffffffffa001aa54>] drm_ioctl+0x1a4/0x630 [drm]
> [ 55.521933] [<ffffffff81123f0c>] ? acct_account_cputime+0x1c/0x20
> [ 55.521934] [<ffffffff811f0520>] do_vfs_ioctl+0x2e0/0x4c0
> [ 55.521937] [<ffffffff8109e304>] ? vtime_account_user+0x54/0x60
> [ 55.521938] [<ffffffff811f0781>] SyS_ioctl+0x81/0xa0
> [ 55.521940] [<ffffffff8176b3b4>] ? int_check_syscall_exit_work+0x34/0x3d
> [ 55.521942] [<ffffffff8176b12d>] system_call_fastpath+0x16/0x1b
> [ 55.521943] ---[ end trace 853866804709104b ]---
> [ 55.832915] init: plymouth-upstart-bridge main process ended, respawning
> [ 55.835816] init: plymouth-upstart-bridge main process (2918) terminated
> with status 1
> [ 55.835831] init: plymouth-upstart-bridge main process ended, respawning
> [ 58.563397] audit: type=1400 audit(1416991047.231:77): apparmor="STATUS"
> operation="profile_replace" name="/usr/lib/cups/backend/cups-pdf" pid=2981
> comm="apparmor_parser"
> [ 58.563401] audit: type=1400 audit(1416991047.231:78): apparmor="STATUS"
> operation="profile_replace" name="/usr/sbin/cupsd" pid=2981
> comm="apparmor_parser"
> [ 58.563595] audit: type=1400 audit(1416991047.231:79): apparmor="STATUS"
> operation="profile_replace" name="/usr/sbin/cupsd" pid=2981
> comm="apparmor_parser"
> [ 815.742431] init: anacron main process (1210) killed by TERM signal
> [ 819.770858] PM: Syncing filesystems ... done.
> [ 820.315110] Freezing user space processes ... (elapsed 0.001 seconds)
> done.
This looks like a problem in X, trying to use an invalid GEM buffer for
rendering. Does it really happen only after S4 resume, or also during normal
booting? CC'ing Chris.
> However, no more crashes, freezes or Oopses.
>
> Also, after a few suspend/resume cycles (twice in 12) I still have the
> problem that the network does not come up again after a resume. When it
> does, I get
>
> [ 3846.934341] r8169 0000:02:00.0 eth0: link up
>
> in dmesg. When it doesn't, I get
>
> [ 6221.007206] show_signal_msg: 120 callbacks suppressed
> [ 6221.007209] Watchdog[2700]: segfault at 0 ip 00007ffe51c623e8 sp
> 00007ffe41dc7560 error 6 in libcontent.so[7ffe513e8000+11d8000]
> [ 6243.712345] Watchdog[29313]: segfault at 0 ip 00007f49e1a3d3e8 sp
> 00007f49d1ba2560 error 6 in libcontent.so[7f49e11c3000+11d8000]
>
> but I don't know if these are related. I also occasionally get this
>
> [ 6520.964686] Restarting tasks ...
> [ 6520.964841] pci_bus 0000:04: Allocating resources
> [ 6520.964855] pci 0000:03:00.0: PCI bridge to [bus 04]
> [ 6520.964859] pci 0000:03:00.0: bridge window [io 0x3000-0x3fff]
> [ 6520.964866] pci 0000:03:00.0: bridge window [mem 0xdf600000-0xdf7fffff]
> [ 6520.964870] pci 0000:03:00.0: bridge window [mem 0xdf800000-0xdf9fffff
> 64bit pref]
> [ 6520.968218] done.
> [ 6520.968224] video LNXVIDEO:00: Restoring backlight state
> [ 6528.107156] r8169 0000:02:00.0 eth0: link down
> [ 6528.107204] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [ 6528.107448] r8169 0000:02:00.0 eth0: link down
> [ 6531.536977] r8169 0000:02:00.0 eth0: link up
> [ 6531.536983] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [ 6543.357696] ------------[ cut here ]------------
> [ 6543.357703] WARNING: CPU: 0 PID: 20681 at net/sched/sch_generic.c:303
> dev_watchdog+0x24f/0x260()
> [ 6543.357704] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
> [ 6543.357705] Modules linked in: bnep(E) rfcomm(E) bluetooth(E)
> snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E)
> snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E)
> intel_rapl(E) snd_pcm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E)
> snd_seq_midi(E) snd_seq_midi_event(E) coretemp(E) snd_rawmidi(E) snd_seq(E)
> kvm_intel(E) snd_seq_device(E) kvm(E) snd_timer(E) snd(E) soundcore(E)
> mei_me(E) shpchp(E) mei(E) lpc_ich(E) serio_raw(E) tpm_infineon(E)
> intel_smartconnect(E) mac_hid(E) parport_pc(E) ppdev(E) lp(E) parport(E)
> dm_crypt(E) netconsole(E) configfs(E) hid_generic(E) usbhid(E) hid(E)
> mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E)
> aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E)
> ablk_helper(E) cryptd(E) i915(E) ahci(E) i2c_algo_bit(E) libahci(E)
> drm_kms_helper(E) r8169(E) mii(E) drm(E) wmi(E) video(E)
> [ 6543.357738] CPU: 0 PID: 20681 Comm: cc1 Tainted: G W E
> 3.18.0-rc6+ #7
> [ 6543.357739] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5
> 05/30/2014
> [ 6543.357740] 0000000000000009 ffff88021ea03d48 ffffffff81762cfc
> 0000000000000000
> [ 6543.357741] ffff88021ea03d98 ffff88021ea03d88 ffffffff8106f0b1
> ffff88021ea03d70
> [ 6543.357743] 0000000000000000 ffff88020fb08000 0000000000000001
> ffff8800c65e1e80
> [ 6543.357744] Call Trace:
> [ 6543.357745] <IRQ> [<ffffffff81762cfc>] dump_stack+0x46/0x58
> [ 6543.357751] [<ffffffff8106f0b1>] warn_slowpath_common+0x81/0xa0
> [ 6543.357753] [<ffffffff8106f116>] warn_slowpath_fmt+0x46/0x50
> [ 6543.357755] [<ffffffff8168469f>] dev_watchdog+0x24f/0x260
> [ 6543.357756] [<ffffffff81684450>] ? dev_graft_qdisc+0x80/0x80
> [ 6543.357759] [<ffffffff810d39fa>] call_timer_fn+0x3a/0x110
> [ 6543.357760] [<ffffffff81684450>] ? dev_graft_qdisc+0x80/0x80
> [ 6543.357762] [<ffffffff810d519f>] run_timer_softirq+0x20f/0x310
> [ 6543.357763] [<ffffffff810731b5>] __do_softirq+0xf5/0x2d0
> [ 6543.357764] [<ffffffff81073685>] irq_exit+0x115/0x120
> [ 6543.357766] [<ffffffff8176dfaa>] smp_apic_timer_interrupt+0x4a/0x60
> [ 6543.357769] [<ffffffff8176c07d>] apic_timer_interrupt+0x6d/0x80
> [ 6543.357769] <EOI>
> [ 6543.357770] ---[ end trace 853866804709104c ]---
> [ 6543.375603] r8169 0000:02:00.0 eth0: link up
>
> after which the network works again.
>
> Is the network issue being worked on actively? If so, I can try on a second
> machine and report back.
I'm not sure, but this is a network driver problem, so could you let the
maintainers of it know about this? IIRC you opened a bug about this already.
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20141128/686794b2/attachment.html>
More information about the intel-gfx-bugs
mailing list