[Bug 59321] New: S4 broken with Haswell
bugzilla-daemon at bugzilla.kernel.org
bugzilla-daemon at bugzilla.kernel.org
Wed Jun 5 02:27:19 PDT 2013
https://bugzilla.kernel.org/show_bug.cgi?id=59321
Summary: S4 broken with Haswell
Product: Drivers
Version: 2.5
Platform: All
OS/Version: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Video(DRI - Intel)
AssignedTo: intel-gfx-bugs at lists.freedesktop.org
ReportedBy: tiwai at suse.de
CC: intel-gfx-bugs at lists.freedesktop.org
Regression: No
On laptops with Haswell, the machine hangs up after certain S4 cycles,
typically up to 20 cycles. 3.10-rc4 is most unstable, usually hits in a couple
of S4 cycles.
With luck, you get Oops message like below and goes to death slowly.
general protection fault: 0000 [#1] SMP
CPU: 3 PID: 3804 Comm: packagekitd Tainted: GF 3.10.0-rc4-test+ #1
task: ffff880231ea8380 ti: ffff88022e138000 task.ti: ffff88q6
RIP: 0010:[<ffffffff81166ed0>] [<ffffffff81166ed0>] path_lookupat+0x120/0x830
RSP: 0018:ffff88022e139cd8 EFLAGS: 00010246
RAX: 00f9000000f80000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88022e139d18 RSI: 0000000000000000 RDI: ffff88022ed4e740
RBP: ffff88022e139d58 R08: ffff88022e139c3f R09: ffff8802358e303e
R10: ffff88022ed4e778 R11: 0000000000000003 R12: ffff88022ec38da0
R13: ffff88022e139da8 R14: 0000000000000000 R15: ffff88022e139d08
FS: 00007f509d934700(0000) GS:ffff88023eac0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f509d924cd8 CR3: 0000000231418000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
ffff88022e139cf8 0000000000000000 ffff88022e139cf8 ffffffff81162945
ffff88022e139d98 0000000000000286 ffff8802360673a0 ffff88022ed4e740
ffff88022ec38da0 0000000000000000 000000d02e139d68 ffff8802358e3000
Call Trace:
[<ffffffff81162945>] ? terminate_walk+0x35/0x40
[<ffffffff81167613>] filename_lookup+0x33/0xd0
[<ffffffff8116877b>] user_path_at_empty+0x7b/0xb0
[<ffffffff8117681c>] ? mntput_no_expire+0x4c/0x1b0
[<ffffffff8115d5c7>] ? cp_new_stat+0x137/0x150
[<ffffffff811687bc>] user_path_at+0xc/0x10
[<ffffffff8115d881>] vfs_fstatat+0x51/0xb0
[<ffffffff8115d949>] vfs_lstat+0x19/0x20
[<ffffffff8115d96f>] SyS_newlstat+0x1f/0x50
[<ffffffff814701d2>] system_call_fastpath+0x16/0x1b
Code: ff ff 83 f8 00 89 c3 0f 85 6e 06 00 00 4c 8b 65 c0 4d 85 e4 0f 84 47 06
00 00 41 f6 44 24 02 04 0f 85 a2 01 00 00 49 8b 44 24 20 <48> 83 78 08 00 0f 84
71 01 00 00 41 83 e6 01 90 0f 84 87 01 00
RIP [<ffffffff81166ed0>] path_lookupat+0x120/0x830
RSP <ffff88022e139cd8>
The Oops patterns vary quite a lot, but most of them are related with vfs path
lookup. For example, another typical Oops is something like below (in this
case, it was on 3.0-based kernel with drm/i915 backports, but also seen on all
kernels):
BUG: soft lockup - CPU#0 stuck for 23s! [sh:11043]
CPU 0
Pid: 11043, comm: sh Tainted: G D NX 3.0.65-0.6.6.1.5358.1.PTF-default
RIP: 0010:[<ffffffff81445c58>] [<ffffffff81445c58>] _raw_spin_lock+0x18/0x20
RSP: 0018:ffff8801b384fc40 EFLAGS: 00000297
RAX: 000000000000f221 RBX: ffff8801b384fc78 RCX: 0000000000013568
RDX: 000000000000f220 RSI: ffffc90000878760 RDI: ffffffff81a02700
RBP: ffffc90000878760 R08: 0000000000000007 R09: 0000000000000025
R10: 0000000000000007 R11: ffffffff811e17e0 R12: ffffffff8144e2ee
R13: 0000000000000000 R14: 00000002000200da R15: 0000000000000000
FS: 00007ff4bd5f1700(0000) GS:ffff8801bfa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff4bcd88428 CR3: 00000001b3970000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sh (pid: 11043, threadinfo ffff8801b384e000, task ffff8801543f64c0)
Stack:
ffffffff81168fa0 ffff88018d19e838 ffffffff8116aa75 0000000000000000
ffff8801b331fbc0 ffff8801b331fbc0 ffff8801bec04d58 0000000000000000
ffffffff811ad190 ffff8801bec04d58 ffff8801b331fbc0 ffff880190f71540
Call Trace:
[<ffffffff81168fa0>] inode_sb_list_add+0x10/0x50
[<ffffffff8116aa75>] iget_locked+0x155/0x170
[<ffffffff811ad190>] proc_get_inode+0x10/0x110
[<ffffffff811b3dd9>] proc_lookup_de+0x69/0xe0
[<ffffffff811adc20>] proc_root_lookup+0x20/0x60
[<ffffffff8115b012>] d_alloc_and_lookup+0x42/0x80
[<ffffffff8115c7c5>] do_lookup+0x2a5/0x3a0
[<ffffffff8115d992>] do_last+0x102/0x800
[<ffffffff8115ecf9>] path_openat+0xd9/0x420
[<ffffffff8115f17c>] do_filp_open+0x4c/0xc0
[<ffffffff8114fdc1>] do_sys_open+0x171/0x1f0
[<ffffffff8144d912>] system_call_fastpath+0x16/0x1b
[<00007ff4bcd18da0>] 0x7ff4bcd18d9f
Code: 0f 95 c0 0f b6 c0 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 b8 00 00 01 00
00 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 17 <eb> f5 c3 0f 1f 44 00
00 9c 58 0f 1f 44 00 00 48 89 c6 fa 66 0f
The problem is found on all kernels up to 3.10-rc4.
Also, it's seen on different Haswell variants. At least, Mobile GT2 and ULT
show the problem.
Some more data points:
- The S4 problem appears both on user-space and kernel hibernation methods.
- S4 cycles are more stable when no network is connected.
The crash above is seen with the test:
* running SLED11 user-space with updated X stack, and
* starting netconsole over Ethernet (r8169 or e1000e drivers)
Without the network connection, S4 survived once over 100 cycles.
But it might be just a luck.
- S4 becomes more stable if you disable loading i915 module in initrd.
On SUSE kernel, i915 module is loaded in initrd, and initrd triggers the
resume of S4 image either via suspend user-space command or writing sysfs.
When I exclude i915 module by setting $NO_KMS_IN_INITRD in
/etc/sysconfig/kernel and run mkinitrd, the problem is rarely seen.
--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the intel-gfx-bugs
mailing list