[Nouveau] [Bug 70390] [NV84] Repeated system crashes under graphics load, E[PFIFO] DMA_PUSHER and lots of E[PGRAPH]

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Oct 17 11:54:08 PDT 2013


https://bugs.freedesktop.org/show_bug.cgi?id=70390

--- Comment #12 from Martin von Gagern <Martin.vGagern at gmx.net> ---
(In reply to comment #11)
> Perhaps I can interest you in WARN_ON_ONCE.

That is very useful, thanks a lot.

So far I left the BUG_ON in place, assuming that it wouldn't trigger in any
case. I was wrong: I just got a kernel BUG report from what seems to be the
BUG_ON I added.

[45018.412278] ------------[ cut here ]------------
[45018.416902] kernel BUG at drivers/gpu/drm/nouveau/nouveau_bo.c:465!
[45018.423162] invalid opcode: 0000 [#1] PREEMPT SMP 
[45018.428001] Modules linked in: nls_cp850 vfat fat usb_storage tun autofs4
ipv6 btrfs xor zlib_deflate raid6_pq libcrc32c dm_mod fuse nfs lockd
nf_conntrack_h323 nf_conntrack_sip nf_conntrack_irc nf_conntrack_ftp
nf_conntrack uhci_hcd sunrpc loop nouveau usbhid snd_hda_codec_via
snd_hda_intel ohci_pci snd_hda_codec ohci_hcd snd_hwdep snd_bt87x ehci_pci
ehci_hcd snd_pcm video usbcore sr_mod cdrom mxm_wmi i2c_algo_bit kvm_amd kvm
ttm drm_kms_helper snd_page_alloc drm microcode k10temp pcspkr evdev snd_timer
snd ata_generic i2c_core r8169 asus_atk0110 sym53c8xx parport_pc
scsi_transport_spi backlight mii parport wmi button usb_common pata_atiixp
soundcore acpi_cpufreq mperf
[45018.488491] CPU: 0 PID: 2834 Comm: X Not tainted 3.11.4-gentoo #1
[45018.494579] Hardware name: System manufacturer System Product
Name/M4A785TD-V EVO, BIOS 2105    07/23/2010
[45018.504216] task: ffff8803ebc09910 ti: ffff8803e7dc4000 task.ti:
ffff8803e7dc4000
[45018.511689] RIP: 0010:[<ffffffffa038d7fb>]  [<ffffffffa038d7fb>]
nouveau_bo_wr32+0x4b/0x50 [nouveau]
[45018.520850] RSP: 0018:ffff8803e7dc5bd0  EFLAGS: 00010246
[45018.526157] RAX: 0000000000000000 RBX: ffff8803ec2d8780 RCX:
0000000000000000
[45018.533284] RDX: 0000000000406040 RSI: ffffc9001019f934 RDI:
0000000000000001
[45018.540409] RBP: 0000000000000000 R08: ffffc90010196000 R09:
ffffc90010196000
[45018.547535] R10: 0000000000000000 R11: 0000000000000000 R12:
000000002001a020
[45018.554660] R13: 0000000000406040 R14: ffff8802933b2280 R15:
0000000000000000
[45018.561787] FS:  00007f9d09b26880(0000) GS:ffff8803ffc00000(0000)
knlGS:00000000f6da2b90
[45018.569865] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[45018.575607] CR2: 00007ff639a65000 CR3: 00000003ec68f000 CR4:
00000000000007f0
[45018.582732] Stack:
[45018.584742]  ffffffffa039343a ffffffffa038ce76 ffff8803e934c6c0
ffff8803e97bde00
[45018.592196]  0000000000000000 ffff8803e7dc5d18 ffffffffa038a02d
ffff88015ebe80c0
[45018.599644]  ffff8803ec2d8780 ffff8803ec2d8780 ffff8803e7dc5d18
0000000000000000
[45018.607098] Call Trace:
[45018.609550]  [<ffffffffa039343a>] ? nv84_fence_emit32+0xda/0x1a0 [nouveau]
[45018.616424]  [<ffffffffa038ce76>] ? nouveau_bo_placement_set+0x76/0x130
[nouveau]
[45018.623904]  [<ffffffffa038a02d>] ? nouveau_fence_emit+0x3d/0xb0 [nouveau]
[45018.630780]  [<ffffffffa038a8c4>] ? nouveau_fence_new+0x64/0xb0 [nouveau]
[45018.637568]  [<ffffffffa0389898>] ? nv50_dma_push+0xc8/0xf0 [nouveau]
[45018.644012]  [<ffffffffa038fdbb>] ? nouveau_gem_ioctl_pushbuf+0x35b/0x12f0
[nouveau]
[45018.651746]  [<ffffffff811166f0>] ? __pollwait+0x110/0x110
[45018.657232]  [<ffffffffa00ed105>] ? drm_ioctl+0x4b5/0x5b0 [drm]
[45018.663156]  [<ffffffffa038fa60>] ? nouveau_gem_ioctl_new+0x1b0/0x1b0
[nouveau]
[45018.670456]  [<ffffffff8111587b>] ? do_vfs_ioctl+0x8b/0x510
[45018.676025]  [<ffffffff81104b45>] ? vfs_read+0x165/0x190
[45018.681333]  [<ffffffff81115da0>] ? SyS_ioctl+0xa0/0xc0
[45018.686553]  [<ffffffff813d7212>] ? system_call_fastpath+0x16/0x1b
[45018.692725] Code: 85 c0 75 04 89 16 c3 90 89 d7 66 0f 1f 44 00 00 e9 bb c8
e7 e0 0f b7 0d b4 30 06 00 8d 79 01 66 85 c9 66 89 3d a7 30 06 00 75 d5 <0f> 0b
0f 1f 00 41 57 41 bf ff 07 00 00 41 56 41 55 41 54 55 53 
[45018.712668] RIP  [<ffffffffa038d7fb>] nouveau_bo_wr32+0x4b/0x50 [nouveau]
[45018.719482]  RSP <ffff8803e7dc5bd0>
[45018.735366] ---[ end trace 706c9cb9b21fa0ba ]---
[45033.732881] nouveau E[ X[2834]] failed to idle channel 0xcccc0000 [X[2834]]
[45048.725980] nouveau E[ X[2834]] failed to idle channel 0xcccc0000 [X[2834]]

I compared the machine code to a disassembly of nouveau_bo_wr32, and this is
indeed in the code path conditioned by a comparison with 0x406040, even though
that comparison itself is not among the dumped bytes.

This time, I recall no extraordinary graphics workload. Machine was mostly
idle, with display in power save mode. Didn't wake up from that, though, and
didn't react to NumLock either. I managed to ssh into the machine and save a
dmesg before rebooting. So no automatic reboot this time, which might be
because the problematic value didn't proceed down the pipe.

Does the stack trace provide any insight into what might be going on here? Does
it tell us whether the bug is in kernel space or in user space?

(In reply to comment #11)
> Take a look at nouveau_gem_pushbuf_validate for that --
> it presently doesn't do any actual data validation.

Does that mean any unprivileged process with access to the video device can
send garbage to the GPU and crash the system?

> You could also do the check in nv50_dma_push.

Had a look at that, and didn't understand what kind of data to inspect. But now
it seems like this would pass through my bug reporting facility in any case.
Unless the thing I reported was a false alarm, and would have been interpreted
as something other than a command.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/nouveau/attachments/20131017/3b3b5e3c/attachment.html>


More information about the Nouveau mailing list