[Nouveau] [Bug 91413] INFO: task Xorg:2419 blocked for more than 120 seconds.

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue May 24 12:57:08 UTC 2016


https://bugs.freedesktop.org/show_bug.cgi?id=91413

Ilia Mirkin <imirkin at alum.mit.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |imirkin at alum.mit.edu

--- Comment #9 from Ilia Mirkin <imirkin at alum.mit.edu> ---
And now I'm getting the same issue after adding a GK208B and a (fanless) NV34
to my system. khugepaged gets stuck like so:

[83695.847012] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
[khugepaged:59]
[83695.847015] Modules linked in: rtl8xxxu mac80211 cfg80211 it87 hwmon_vid
nouveau uas usb_storage fbcon video bitblit softcursor font i2c_algo_bit ttm
drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt
fb_sys_fops cfbcopyarea drm backlight fb fbdev mxm_wmi wmi
[83695.847032] CPU: 0 PID: 59 Comm: khugepaged Tainted: G          I  L  4.6.0+
#2
[83695.847033] Hardware name: Gigabyte Technology Co., Ltd.
EX58-UD3R/EX58-UD3R, BIOS FB  05/04/2009
[83695.847035] task: ffff8801d822b000 ti: ffff8801d8314000 task.ti:
ffff8801d8314000
[83695.847036] RIP: 0010:[<ffffffff810e9fef>]  [<ffffffff810e9fef>]
smp_call_function_many+0x1de/0x1f1
[83695.847042] RSP: 0018:ffff8801d8317be0  EFLAGS: 00000202
[83695.847044] RAX: 0000000000000005 RBX: ffff8801dfc169c8 RCX:
0000000000000005
[83695.847045] RDX: ffff8801dfd59328 RSI: 0000000000000008 RDI:
ffff8801dfc169c8
[83695.847046] RBP: ffff8801d8317c20 R08: 0000000000000005 R09:
0000000000000000
[83695.847047] R10: 000000000000175d R11: 0000000000000009 R12:
ffff8801dfc169c0
[83695.847049] R13: 0000000000016980 R14: 0000000000000001 R15:
0000000000000008
[83695.847050] FS:  0000000000000000(0000) GS:ffff8801dfc00000(0000)
knlGS:0000000000000000
[83695.847051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[83695.847053] CR2: 00007f2c7ed83000 CR3: 0000000001c07000 CR4:
00000000000006f0
[83695.847054] Stack:
[83695.847055]  0100000000000000 0000000000000000 ffffffff8114d0c8
0000000000000000
[83695.847057]  ffffffff8114d0c8 0000000000000000 ffffffff81f31bc8
0000000000000009
[83695.847059]  ffff8801d8317c50 ffffffff810ea0a5 0000000000000008
0000000000000000
[83695.847061] Call Trace:
[83695.847065]  [<ffffffff8114d0c8>] ? page_alloc_cpu_notify+0x41/0x41
[83695.847067]  [<ffffffff8114d0c8>] ? page_alloc_cpu_notify+0x41/0x41
[83695.847068]  [<ffffffff810ea0a5>] on_each_cpu_mask+0x28/0x48
[83695.847070]  [<ffffffff8114d5f2>] drain_all_pages+0x94/0xbb
[83695.847073]  [<ffffffff8114fa07>] __alloc_pages_nodemask+0x5f3/0x8b0
[83695.847075]  [<ffffffff81173b64>] ? __page_set_anon_rmap+0x31/0x7d
[83695.847079]  [<ffffffff8118a919>] __alloc_pages_node.isra.57+0x12/0x14
[83695.847081]  [<ffffffff8118ae09>] khugepaged+0xcc/0x10d1
[83695.847084]  [<ffffffff810bdebe>] ? finish_wait+0x62/0x62
[83695.847086]  [<ffffffff8118ad3d>] ? maybe_pmd_mkwrite+0x1a/0x1a
[83695.847089]  [<ffffffff810a8f93>] kthread+0xa5/0xad
[83695.847093]  [<ffffffff81766492>] ret_from_fork+0x22/0x40
[83695.847094]  [<ffffffff810a8eee>] ? init_completion+0x24/0x24
[83695.847095] Code: 74 2d 48 89 de 89 c7 e8 f8 f9 ff ff 3b 05 4e c1 c3 00 7d
1b 48 63 c8 49 8b 14 24 48 03 14 cd c0 4a d2 81 f6 42 18 01 74 04 f3 90 <eb> f6
eb d3 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 66 

But looking at all the active CPUs, there's always one with

[83696.530737] NMI backtrace for cpu 5
[83696.530737] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G          I  L  4.6.0+
#2
[83696.530739] Hardware name: Gigabyte Technology Co., Ltd.
EX58-UD3R/EX58-UD3R, BIOS FB  05/04/2009
[83696.530740] task: ffff8801d8a3b000 ti: ffff8801d8a50000 task.ti:
ffff8801d8a50000
[83696.530741] RIP: 0010:[<ffffffff810bff24>]  [<ffffffff810bff24>]
queued_spin_lock_slowpath+0x59/0x173
[83696.530742] RSP: 0018:ffff8801dfd43b98  EFLAGS: 00000002
[83696.530743] RAX: 0000000000000101 RBX: 0000000000000086 RCX:
0000000000000101
[83696.530744] RDX: 0000000000000100 RSI: 0000000000000001 RDI:
ffff8801d75ff108
[83696.530745] RBP: ffff8801dfd43b98 R08: 0000000000000001 R09:
ffff8801dfd43d53
[83696.530746] R10: ffff8801dfd43d27 R11: 0000000000000000 R12:
ffff8801d75ff108
[83696.530747] R13: ffff8801d6ede000 R14: ffff8801d75ff000 R15:
ffff8801d528d800
[83696.530748] FS:  0000000000000000(0000) GS:ffff8801dfd40000(0000)
knlGS:0000000000000000
[83696.530749] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[83696.530750] CR2: 00000000004330c1 CR3: 0000000001c07000 CR4:
00000000000006e0
[83696.530751] Stack:
[83696.530752]  ffff8801dfd43bb0 ffffffff817661b4 0000000000000029
ffff8801dfd43c00
[83696.530753]  ffffffffc03489b7 0000000000000010 ffff8801d7707400
ffff8801d528d700
[83696.530754]  ffff8801d75ff000 0000000000000029 ffff8801d6ede000
0000000000000029
[83696.530755] Call Trace:
[83696.530756]  <IRQ> d [<ffffffff817661b4>] _raw_spin_lock_irqsave+0x23/0x29
[83696.530757]  [<ffffffffc03489b7>] nvkm_fantog_update+0x43/0x103 [nouveau]
[83696.530758]  [<ffffffffc0348ac9>] nvkm_fantog_set+0x38/0x3f [nouveau]
[83696.530759]  [<ffffffffc034818d>] nvkm_fan_update+0x12c/0x1a7 [nouveau]
[83696.530760]  [<ffffffffc0348255>] nvkm_therm_fan_set+0x19/0x1b [nouveau]
[83696.530761]  [<ffffffffc0347c5a>] nvkm_therm_update+0x223/0x230 [nouveau]
[83696.530762]  [<ffffffffc0347c7c>] nvkm_therm_alarm+0x15/0x17 [nouveau]
[83696.530763]  [<ffffffffc034a837>] nvkm_timer_alarm_trigger+0xde/0xf6
[nouveau]
[83696.530764]  [<ffffffffc034a943>] nvkm_timer_alarm+0xaa/0xb3 [nouveau]
[83696.530765]  [<ffffffffc0348a5c>] nvkm_fantog_update+0xe8/0x103 [nouveau]
[83696.530766]  [<ffffffffc0348a8f>] nvkm_fantog_alarm+0x18/0x1a [nouveau]
[83696.530767]  [<ffffffffc034a837>] nvkm_timer_alarm_trigger+0xde/0xf6
[nouveau]
[83696.530768]  [<ffffffffc034abb3>] nv04_timer_intr+0x39/0x9f [nouveau]
[83696.530769]  [<ffffffffc034a71f>] nvkm_timer_intr+0x14/0x16 [nouveau]
[83696.530770]  [<ffffffffc03115d8>] nvkm_subdev_intr+0x17/0x19 [nouveau]
[83696.530771]  [<ffffffffc033f5fb>] nvkm_mc_intr+0x81/0xd2 [nouveau]
[83696.530772]  [<ffffffffc0342efa>] nvkm_pci_intr+0x4a/0x5c [nouveau]
[83696.530773]  [<ffffffff810cbc26>] handle_irq_event_percpu+0x6c/0x196
[83696.530773]  [<ffffffff810cbd7b>] handle_irq_event+0x2b/0x4b
[83696.530774]  [<ffffffff810ce999>] handle_edge_irq+0xa6/0xc3
[83696.530775]  [<ffffffff8105841b>] handle_irq+0x109/0x111
[83696.530776]  [<ffffffff8176859b>] do_IRQ+0x4b/0xba
[83696.530777]  [<ffffffff81766bbf>] common_interrupt+0x7f/0x7f
[83696.530778]  <EOI> d [<ffffffff8157911f>] ? cpuidle_enter_state+0x103/0x15b
[83696.530779]  [<ffffffff815791a3>] cpuidle_enter+0x17/0x19
[83696.530780]  [<ffffffff810be61e>] cpu_startup_entry+0x192/0x1fd
[83696.530781]  [<ffffffff8106f2ff>] start_secondary+0xe0/0xe3
[83696.530782] Code: ff ff 75 33 83 fe 01 89 ca 89 f0 41 0f 45 d0 f0 0f b1 17
39 f0 74 04 89 c6 eb e1 ff ca 0f 84 20 01 00 00 8b 07 84 c0 74 04 f3 90 <eb> f6
66 c7 07 01 00 e9 0c 01 00 00 48 c7 c0 40 65 01 00 65 48 

Since it's in handle_irq, I assume that that takes some lock which basically
prevents the system from proceeding (except for NMIs). I can ssh into the
system, and even run some things (like these traces), but khugepaged is at
100%, and some other processes tend to hang. My VBIOS is to follow shortly.
Last I debugged this, I determined that this could happen if some time interval
we computed turned out to be 0, causing the timer to be executed from the
interrupt.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160524/a94c72be/attachment.html>


More information about the Nouveau mailing list