[Nouveau] kernel bug nouveau, total system hang, X crashed

Peter Maloney peter.maloney at brockmann-consult.de
Tue Jun 18 05:49:02 PDT 2013


Hi,

Using kernel 3.9.4, with openSUSE 12.1 (KDE 4.7.4 I think), I was
running fine for a long time with no problems. Today with openSUSE 12.3
(KDE 4.10.3, Xorg 1.13.2, upgraded on Jun. 10), my machine hung
completely. I believe the nouveau driver is at fault rather than KDE or
X, so chose this list. I think it might have been triggered by the
"Clock" ScreenLocker (screen saver). It happened twice so far.

I'm not on the list, so please CC me.


Here is a snippet from syslog where some strange stuff begins (while I
am not using the computer):

2013-06-14T03:59:34.103035+02:00 linux-zxd7 kernel: [303714.267370]
nouveau E[   PFIFO][0000:04:00.0] DMA_PUSHER - ch 2 [Xorg[1761]] get
0x002003b0d4 put 0x002003b134 ib_get 0x00000360 ib_put 0x00000361 state
0x80000024 (err: INVALID_CMD) push 0x00400040
2013-06-14T03:59:34.104254+02:00 linux-zxd7 kernel: [303714.267632]
nouveau E[   PFIFO][0000:04:00.0] DMA_PUSHER - ch 2 [Xorg[1761]] get
0x002003b134 put 0x002003b194 ib_get 0x00000362 ib_put 0x00000363 state
0x80000024 (err: INVALID_CMD) push 0x00400040
2013-06-14T03:59:34.120218+02:00 linux-zxd7 kernel: [303714.283686]
nouveau E[   PFIFO][0000:04:00.0] DMA_PUSHER - ch 2 [Xorg[1761]] get
0x002003b194 put 0x002003b1f4 ib_get 0x00000364 ib_put 0x00000365 state
0x80000024 (err: INVALID_CMD) push 0x00400040
2013-06-14T03:59:34.120238+02:00 linux-zxd7 kernel: [303714.283903]
nouveau E[   PFIFO][0000:04:00.0] DMA_PUSHER - ch 2 [Xorg[1761]] get
0x002003b1f4 put 0x002003b254 ib_get 0x00000366 ib_put 0x00000367 state
0x80000024 (err: INVALID_CMD) push 0x00400040
2013-06-14T03:59:34.120241+02:00 linux-zxd7 kernel: [303714.284025]
nouveau E[   PFIFO][0000:04:00.0] DMA_PUSHER - ch 2 [Xorg[1761]] get
0x002003b254 put 0x002003b2c0 ib_get 0x00000368 ib_put 0x00000369 state
0x80000024 (err: INVALID_CMD) push 0x00400040
2013-06-14T03:59:34.120244+02:00 linux-zxd7 kernel: [303714.284060]
nouveau E[   PFIFO][0000:04:00.0] DMA_PUSHER - ch 2 [Xorg[1761]] get
0x002003b2c0 put 0x002003b32c ib_get 0x0000036a ib_put 0x0000036b state
0x80000024 (err: INVALID_CMD) push 0x00400040
2013-06-14T03:59:34.120250+02:00 linux-zxd7 kernel: [303714.284092]
nouveau E[   PFIFO][0000:04:00.0] DMA_PUSHER - ch 2 [Xorg[1761]] get
0x002003b32c put 0x002003b398 ib_get 0x0000036c ib_put 0x0000036d state
0x80000024 (err: INVALID_CMD) push 0x00400040
2013-06-14T03:59:34.120252+02:00 linux-zxd7 kernel: [303714.284125]
nouveau E[   PFIFO][0000:04:00.0] DMA_PUSHER - ch 2 [Xorg[1761]] get
0x002003b398 put 0x002003b404 ib_get 0x0000036e ib_put 0x0000036f state
0x80000024 (err: INVALID_CMD) push 0x00400040
2013-06-14T03:59:34.124255+02:00 linux-zxd7 kernel: [303714.285213]
nouveau E[  PGRAPH][0000:04:00.0] TRAP_M2MF IN
2013-06-14T03:59:34.124266+02:00 linux-zxd7 kernel: [303714.285219]
nouveau E[  PGRAPH][0000:04:00.0] TRAP_M2MF 00320051 6ade1280 00000000
04000432
2013-06-14T03:59:34.124267+02:00 linux-zxd7 kernel: [303714.285222]
nouveau E[  PGRAPH][0000:04:00.0]  TRAP
2013-06-14T03:59:34.124268+02:00 linux-zxd7 kernel: [303714.285225]
nouveau E[  PGRAPH][0000:04:00.0] ch 2 [0x0037b10000 Xorg[1761]] subc 0
class 0x5039 mthd 0x0314 data 0x00000108
2013-06-14T03:59:34.124269+02:00 linux-zxd7 kernel: [303714.285236]
nouveau E[     PFB][0000:04:00.0] trapped read at 0x006adce9f0 on
channel 0x00037b10 [Xorg[1761]] PGRAPH/DISPATCH/M2M_IN reason:
PAGE_NOT_PRESENT
2013-06-14T03:59:34.136377+02:00 linux-zxd7 kernel: [303714.299033]
nouveau E[  PGRAPH][0000:04:00.0] TRAP_M2MF IN
2013-06-14T03:59:34.136392+02:00 linux-zxd7 kernel: [303714.299041]
nouveau E[  PGRAPH][0000:04:00.0] TRAP_M2MF 00320151 6add5080 00000000
04000000
2013-06-14T03:59:34.136394+02:00 linux-zxd7 kernel: [303714.299044]
nouveau E[  PGRAPH][0000:04:00.0]  TRAP
2013-06-14T03:59:34.136396+02:00 linux-zxd7 kernel: [303714.299047]
nouveau E[  PGRAPH][0000:04:00.0] ch 2 [0x0037b10000 Xorg[1761]] subc 0
class 0x5039 mthd 0x023c data 0x00000000
2013-06-14T03:59:34.136404+02:00 linux-zxd7 kernel: [303714.299057]
nouveau E[     PFB][0000:04:00.0] trapped read at 0x006add4de0 on
channel 0x00037b10 [Xorg[1761]] PGRAPH/DISPATCH/M2M_IN reason: NULL_DMAOBJ
2013-06-14T03:59:34.136406+02:00 linux-zxd7 kernel: [303714.299066]
nouveau E[  PGRAPH][0000:04:00.0] TRAP_M2MF IN
2013-06-14T03:59:34.136407+02:00 linux-zxd7 kernel: [303714.299071]
nouveau E[  PGRAPH][0000:04:00.0] TRAP_M2MF 00320151 00000380 00000000
04000000
2013-06-14T03:59:34.136417+02:00 linux-zxd7 kernel: [303714.299073]
nouveau E[  PGRAPH][0000:04:00.0]  TRAP
2013-06-14T03:59:34.136418+02:00 linux-zxd7 kernel: [303714.299075]
nouveau E[  PGRAPH][0000:04:00.0] ch 2 [0x0037b10000 Xorg[1761]] subc 0
class 0x5039 mthd 0x0200 data 0x00000001
2013-06-14T03:59:34.136420+02:00 linux-zxd7 kernel: [303714.299476]
nouveau E[  PGRAPH][0000:04:00.0] TRAP_M2MF IN
2013-06-14T03:59:34.136420+02:00 linux-zxd7 kernel: [303714.299481]
nouveau E[  PGRAPH][0000:04:00.0] TRAP_M2MF 00320151 6abdf380 00000000
04000000
2013-06-14T03:59:34.136421+02:00 linux-zxd7 kernel: [303714.299484]
nouveau E[  PGRAPH][0000:04:00.0]  TRAP
2013-06-14T03:59:34.136422+02:00 linux-zxd7 kernel: [303714.299486]
nouveau E[  PGRAPH][0000:04:00.0] ch 2 [0x0037b10000 Xorg[1761]] subc 0
class 0x5039 mthd 0x0328 data 0x00000000



And here is a stack trace with X crashing a bit later (also while I am
not using the computer):

2013-06-14T04:23:09.912406+02:00 linux-zxd7 kernel: [305129.599004] BUG:
soft lockup - CPU#0 stuck for 23s! [Xorg:29026]
2013-06-14T04:23:09.916319+02:00 linux-zxd7 kernel: [305129.599048]
Modules linked in: dm_snapshot af_packet arc4 ecb md4 sha256_generic md5
nls_utf8 cifs fscache vboxpci(O) vboxnetadp(O) vb
oxnetflt(O) vboxdrv(O) bnep bluetooth rfkill btrfs raid6_pq zlib_deflate
xor ufs qnx4 hfsplus hfs minix vfat msdos fat jfs xfs libcrc32c reiserfs
xt_tcpudp xt_pkttype xt_physdev xt_LOG xt_li
mit bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_con
ntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack
ip6table_filter fuse ip6_tables x_tables dm_mod snd_hda_codec_analog
snd_hda_intel snd_hda_codec snd_hwdep acpi_cpufreq snd_pcm
 mperf coretemp snd_seq snd_timer snd_seq_device kvm_intel snd mvsas
libsas kvm ata_generic shpchp firewire_ohci sr_mod i7core_edac
pci_hotplug firewire_core asus_atk0110 edac_core i2c_i801
pata_marvell cdrom r8169 iTCO_wdt iTCO_vendor_support ehci_pci lpc_ich
mfd_core sg crc32c_intel soundcore crc_itu_t scsi_transport_sas
snd_page_alloc pcspkr microcode autofs4 hid_generic usb
hid uhci_hcd ehci_hcd nouveau ttm xhci_hcd drm_kms_helper drm usbcore
i2c_algo_bit usb_common mxm_wmi video wmi button processor thermal_sys
scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_al
ua scsi_dh
2013-06-14T04:23:09.916333+02:00 linux-zxd7 kernel: [305129.599163] CPU 0
2013-06-14T04:23:09.916336+02:00 linux-zxd7 kernel: [305129.599168] Pid:
29026, comm: Xorg Tainted: G           O 3.9.4-1.g51bf0ff-default #1
System manufacturer System Product Name/P6T WS P
RO
2013-06-14T04:23:09.916353+02:00 linux-zxd7 kernel: [305129.599170] RIP:
0010:[<ffffffff81584219>]  [<ffffffff81584219>]
_raw_spin_unlock_irqrestore+0x9/0x10
2013-06-14T04:23:09.916355+02:00 linux-zxd7 kernel: [305129.599179] RSP:
0018:ffff880605ecbb00  EFLAGS: 00000286
2013-06-14T04:23:09.916356+02:00 linux-zxd7 kernel: [305129.599181] RAX:
0000000000010001 RBX: ffffffffa016a2bc RCX: 0000000000000001
2013-06-14T04:23:09.916357+02:00 linux-zxd7 kernel: [305129.599183] RDX:
ffffc90013b00500 RSI: 0000000000000286 RDI: 0000000000000286
2013-06-14T04:23:09.916358+02:00 linux-zxd7 kernel: [305129.599185] RBP:
0000000000000501 R08: 0000000000000000 R09: 0000000000002e31
2013-06-14T04:23:09.916359+02:00 linux-zxd7 kernel: [305129.599187] R10:
0000000000000002 R11: 0000000000002e30 R12: ffff88061ab62d80
2013-06-14T04:23:09.916360+02:00 linux-zxd7 kernel: [305129.599189] R13:
ffff88061ab623c0 R14: ffffffffa016a595 R15: 0000000000000001
2013-06-14T04:23:09.916361+02:00 linux-zxd7 kernel: [305129.599194] FS:
0000000000000000(0000) GS:ffff88063fc00000(0000) knlGS:0000000000000000
2013-06-14T04:23:09.916362+02:00 linux-zxd7 kernel: [305129.599195] CS:
0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2013-06-14T04:23:09.916363+02:00 linux-zxd7 kernel: [305129.599197] CR2:
000000000280ad58 CR3: 0000000001a0d000 CR4: 00000000000007f0
2013-06-14T04:23:09.916364+02:00 linux-zxd7 kernel: [305129.599198] DR0:
0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2013-06-14T04:23:09.916364+02:00 linux-zxd7 kernel: [305129.599199] DR3:
0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
2013-06-14T04:23:09.916365+02:00 linux-zxd7 kernel: [305129.599200]
Process Xorg (pid: 29026, threadinfo ffff880605eca000, task
ffff8805ed822340)
2013-06-14T04:23:09.916366+02:00 linux-zxd7 kernel: [305129.599201] Stack:
2013-06-14T04:23:09.916367+02:00 linux-zxd7 kernel: [305129.599204]
ffffffffa01b4c6d 0000000000000015 ffff8801f65aa200 ffff88061b341980
2013-06-14T04:23:09.916368+02:00 linux-zxd7 kernel: [305129.599206]
ffff88061b341998 ffff88061b3419e8 ffffffffa016cec8 ffff88061b341998
2013-06-14T04:23:09.916369+02:00 linux-zxd7 kernel: [305129.599208]
ffff88061b3419b0 ffff880034815400 ffffffffa01c311a ffff88061b341980
2013-06-14T04:23:09.916370+02:00 linux-zxd7 kernel: [305129.599209] Call
Trace:
2013-06-14T04:23:09.916371+02:00 linux-zxd7 kernel: [305129.599248]
[<ffffffffa01b4c6d>] nv84_graph_tlb_flush+0x28d/0x2c0 [nouveau]
2013-06-14T04:23:09.916372+02:00 linux-zxd7 kernel: [305129.599370]
[<ffffffffa016cec8>] nv50_vm_flush+0x78/0x90 [nouveau]
2013-06-14T04:23:09.916373+02:00 linux-zxd7 kernel: [305129.599457]
[<ffffffffa01c311a>] nouveau_bo_vma_del+0x9a/0xa0 [nouveau]
2013-06-14T04:23:09.916374+02:00 linux-zxd7 kernel: [305129.599601]
[<ffffffffa01c5040>] nouveau_abi16_chan_fini.isra.1+0xa0/0x170 [nouveau]
2013-06-14T04:23:09.916375+02:00 linux-zxd7 kernel: [305129.599747]
[<ffffffffa01c5310>] nouveau_abi16_fini+0x30/0x80 [nouveau]
2013-06-14T04:23:09.916376+02:00 linux-zxd7 kernel: [305129.599889]
[<ffffffffa01bc0d7>] nouveau_drm_preclose+0x27/0x90 [nouveau]
2013-06-14T04:23:09.916377+02:00 linux-zxd7 kernel: [305129.600006]
[<ffffffffa00fe7fe>] drm_release+0x6e/0x620 [drm]
2013-06-14T04:23:09.916378+02:00 linux-zxd7 kernel: [305129.600019]
[<ffffffff81173c9b>] __fput+0xdb/0x240
2013-06-14T04:23:09.916379+02:00 linux-zxd7 kernel: [305129.600027]
[<ffffffff810655c4>] task_work_run+0xb4/0xd0
2013-06-14T04:23:09.916380+02:00 linux-zxd7 kernel: [305129.600033]
[<ffffffff8104b606>] do_exit+0x2b6/0xa40
2013-06-14T04:23:09.916380+02:00 linux-zxd7 kernel: [305129.600038]
[<ffffffff8104be08>] do_group_exit+0x38/0xa0
2013-06-14T04:23:09.916381+02:00 linux-zxd7 kernel: [305129.600044]
[<ffffffff8105a9f2>] get_signal_to_deliver+0x1b2/0x5d0
2013-06-14T04:23:09.916382+02:00 linux-zxd7 kernel: [305129.600051]
[<ffffffff81002353>] do_signal+0x63/0x8c0
2013-06-14T04:23:09.916383+02:00 linux-zxd7 kernel: [305129.600056]
[<ffffffff81002c48>] do_notify_resume+0x98/0xc0
2013-06-14T04:23:09.916384+02:00 linux-zxd7 kernel: [305129.600064]
[<ffffffff8158c36a>] int_signal+0x12/0x17
2013-06-14T04:23:09.916385+02:00 linux-zxd7 kernel: [305129.600074]
[<00007f93254763d5>] 0x7f93254763d4
2013-06-14T04:23:09.916386+02:00 linux-zxd7 kernel: [305129.600077]
Code: 66 39 c2 74 0f 0f 1f 44 00 00 f3 90 0f b7 07 66 39 d0 75 f6 c3 66
66 66 66 2e 0f 1f 84 00 00 00 00 00 66 83 07 01 48 89 f7 57 9d <66> 66
90 66 90 c3 90 ba ff ff ff ff f0 0f c1 17 83 ea 01 b8 01
2013-06-14T04:23:14.912452+02:00 linux-zxd7 kernel: [305132.598016]
nouveau E[Xorg[29026]] failed to idle channel 0xcccc0000 [Xorg[29026]]
2013-06-14T04:23:14.912466+02:00 linux-zxd7 kernel: [305134.597340]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH TLB flush idle timeout fail
2013-06-14T04:23:14.912468+02:00 linux-zxd7 kernel: [305134.597343]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_STATUS  : 0x00000501 BUSY
CTXPROG CCACHE_UNK4
2013-06-14T04:23:14.912470+02:00 linux-zxd7 kernel: [305134.597349]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_VSTATUS0: 0x00000008 CCACHE
2013-06-14T04:23:14.912472+02:00 linux-zxd7 kernel: [305134.597353]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_VSTATUS1: 0x00000000
2013-06-14T04:23:14.912475+02:00 linux-zxd7 kernel: [305134.597357]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_VSTATUS2: 0x00000000
2013-06-14T04:23:16.912520+02:00 linux-zxd7 kernel: [305136.596773]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH TLB flush idle timeout fail
2013-06-14T04:23:16.912527+02:00 linux-zxd7 kernel: [305136.596777]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_STATUS  : 0x00000501 BUSY
CTXPROG CCACHE_UNK4
2013-06-14T04:23:16.912530+02:00 linux-zxd7 kernel: [305136.596782]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_VSTATUS0: 0x00000008 CCACHE
2013-06-14T04:23:16.912532+02:00 linux-zxd7 kernel: [305136.596786]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_VSTATUS1: 0x00000000
2013-06-14T04:23:16.912534+02:00 linux-zxd7 kernel: [305136.596789]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_VSTATUS2: 0x00000000
2013-06-14T04:23:18.912705+02:00 linux-zxd7 kernel: [305138.596280]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH TLB flush idle timeout fail
2013-06-14T04:23:18.912711+02:00 linux-zxd7 kernel: [305138.596285]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_STATUS  : 0x00000501 BUSY
CTXPROG CCACHE_UNK4
2013-06-14T04:23:18.912714+02:00 linux-zxd7 kernel: [305138.596291]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_VSTATUS0: 0x00000008 CCACHE
2013-06-14T04:23:18.912716+02:00 linux-zxd7 kernel: [305138.596295]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_VSTATUS1: 0x00000000
2013-06-14T04:23:18.912718+02:00 linux-zxd7 kernel: [305138.596298]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_VSTATUS2: 0x00000000
2013-06-14T04:23:20.912856+02:00 linux-zxd7 kernel: [305140.595790]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH TLB flush idle timeout fail
2013-06-14T04:23:20.912868+02:00 linux-zxd7 kernel: [305140.595794]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_STATUS  : 0x00000501 BUSY
CTXPROG CCACHE_UNK4
2013-06-14T04:23:20.912872+02:00 linux-zxd7 kernel: [305140.595798]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_VSTATUS0: 0x00000008 CCACHE
2013-06-14T04:23:20.912875+02:00 linux-zxd7 kernel: [305140.595801]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_VSTATUS1: 0x00000000
2013-06-14T04:23:20.912877+02:00 linux-zxd7 kernel: [305140.595804]
nouveau E[  PGRAPH][0000:04:00.0] PGRAPH_VSTATUS2: 0x00000000
2013-06-14T04:23:22.915949+02:00 linux-zxd7 kdm[1402]: X server for
display :0 terminated unexpectedly
2013-06-14T04:23:22.916194+02:00 linux-zxd7 kernel: [305142.595270]
nouveau E[   PFIFO][0000:04:00.0] channel 2 [Xorg[29026]] unload timeout


After X crashed, I could hit ctrl+alt+f1 to get to a text terminal,
where I tried to restart X, which made the system hang completely;
ctrl+alt+del, and even alt+sysrq+b would not reboot the system.

Here is a screenshot of what it looked like at this point:
http://s270.photobucket.com/user/peetaur/media/afterXrestarted_zps0b6fcbad.jpg.html



# lspci | grep VGA
04:00.0 VGA compatible controller: NVIDIA Corporation GT200 [GeForce GTX
260] (rev a1)
# uname -a
Linux peter 3.9.4-1.g51bf0ff-default #1 SMP Fri May 24 19:52:42 UTC 2013
(51bf0ff) x86_64 x86_64 x86_64 GNU/Linux
# kde4-config --version
Qt: 4.8.4
KDE Development Platform: 4.10.3 "release 1"
kde4-config: 1.0
# X -version

X.Org X Server 1.13.2
Release Date: 2013-01-24
X Protocol Version 11, Revision 0
Build Operating System: openSUSE SUSE LINUX
Current Operating System: Linux peter 3.9.4-1.g51bf0ff-default #1 SMP
Fri May 24 19:52:42 UTC 2013 (51bf0ff) x86_64
Kernel command line: BOOT_IMAGE=/vmlinuz-3.9.4-1.g51bf0ff-default
root=UUID=93a77b67-6950-476c-9709-f248bfa94e76
resume=/dev/disk/by-id/ata-Hitachi_HDS5C3030ALA630_MJ1311YNG44E5A-part5
splash=silent quiet showopts
Build Date: 30 April 2013  08:24:17AM
 Current version of pixman: 0.28.2
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.




More information about the Nouveau mailing list