Nouveau failing during probe followed by GPF on 3.13-rc2

Bruno Prémont bonbons at linux-vserver.org
Wed Dec 4 06:45:00 PST 2013


Hi Ilia,

On Wed, 4 Dec 2013 06:15:30 -0500 Ilia Mirkin wrote:
> On Wed, Dec 4, 2013 at 6:01 AM, Bruno Prémont wrote:
> > With 3.13-rc1 and 3.13-rc2 kernel crashes/BUGs while loading nouveau:
> > [  657.654915] ACPI Warning: \_SB_.PCI0.IXVE.IGPU._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20131115/nsarguments-95)
> > [  657.655099] ACPI Warning: \_SB_.PCI0.IXVE.IGPU._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20131115/nsarguments-95)
> > [  657.655270] checking generic (80010000 640000) vs hw (80000000 10000000)
> > [  657.655273] fb: conflicting fb hw usage nouveaufb vs simple - removing generic driver
> > [  657.655383] Console: switching to colour dummy device 80x25
> > [  657.655632] nouveau 0000:02:00.0: enabling device (0006 -> 0007)
> > [  657.657149] ACPI: PCI Interrupt Link [LGPU] enabled at IRQ 16
> > [  657.657456] [drm] hdmi device  not found 2 0 1
> > [  657.657954] nouveau  [  DEVICE][0000:02:00.0] BOOT0  : 0x0ac800b1
> > [  657.657958] nouveau  [  DEVICE][0000:02:00.0] Chipset: MCP79/MCP7A (NVAC)
> > [  657.657960] nouveau  [  DEVICE][0000:02:00.0] Family : NV50
> > [  657.665274] nouveau  [   VBIOS][0000:02:00.0] checking PRAMIN for image...
> > [  657.722478] nouveau  [   VBIOS][0000:02:00.0] ... appears to be valid
> > [  657.722481] nouveau  [   VBIOS][0000:02:00.0] using image from PRAMIN
> > [  657.722624] nouveau  [   VBIOS][0000:02:00.0] BIT signature found
> > [  657.722627] nouveau  [   VBIOS][0000:02:00.0] version 62.79.47.00.01
> > [  657.745324] nouveau 0000:02:00.0: irq 42 for MSI/MSI-X
> > [  657.745360] nouveau  [     PMC][0000:02:00.0] MSI interrupts enabled
> > [  657.745437] nouveau  [     PFB][0000:02:00.0] RAM type: stolen system memory
> > [  657.745441] nouveau  [     PFB][0000:02:00.0] RAM size: 256 MiB
> > [  657.745444] nouveau  [     PFB][0000:02:00.0]    ZCOMP: 0 tags
> > [  657.800072] nouveau  [  PTHERM][0000:02:00.0] FAN control: none / external
> > [  657.800083] nouveau  [  PTHERM][0000:02:00.0] fan management: automatic
> > [  657.800086] nouveau  [  PTHERM][0000:02:00.0] internal sensor: yes
> > [  657.800105] nouveau  [     CLK][0000:02:00.0] 03: core 100 MHz shader 200 MHz
> > [  657.800111] nouveau  [     CLK][0000:02:00.0] 05: core 150 MHz shader 300 MHz
> > [  657.800116] nouveau  [     CLK][0000:02:00.0] 0e: core 300 MHz shader 600 MHz
> > [  657.800121] nouveau  [     CLK][0000:02:00.0] 0f: core 350 MHz shader 800 MHz
> > [  657.800135] nouveau E[     CLK][0000:02:00.0] 17 freq unknown
> > [  657.800137] nouveau E[     CLK][0000:02:00.0] init failed, -22
> 
> There are some patches in
> http://cgit.freedesktop.org/nouveau/linux-2.6/log/?h=drm-nouveau-next
> that should help with that, specifically:
> 
> http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?h=drm-nouveau-next&id=a7e4201f0f7d47e03b851f06f8987856e8d33083

Yes, that one prevents the "freq unknown" error!
It should probably be pushed to dave/linus for rc3.

With it applied nouveau loads successfully.

> > [  657.800140] nouveau E[     DRM] failed to create 0x80000080, -22
> > [  657.802123] general protection fault: 0000 [#1] SMP
> > [  657.802130] Modules linked in: nouveau(+) ttm drm_kms_helper
> > [  657.802140] CPU: 0 PID: 2999 Comm: modprobe Not tainted 3.13.0-rc2-air+ #5
> > [  657.802144] Hardware name: Apple Inc. MacBookAir2,1/Mac-F42D88C8, BIOS    MBA21.88Z.0075.B03.0811141325 11/14/08
> > [  657.802150] task: ffff88007f161520 ti: ffff88007defe000 task.ti: ffff88007defe000
> > [  657.802154] RIP: 0010:[<ffffffff813d2af0>]  [<ffffffff813d2af0>] device_del+0x10/0x1b0
> > [  657.802165] RSP: 0018:ffff88007deff9f8  EFLAGS: 00010292
> > [  657.802168] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffff81a6f237
> > [  657.802173] RDX: ffffffff81876dea RSI: ffffffff81a6e811 RDI: 6b6b6b6b6b6b6b6b
> > [  657.802177] RBP: ffff88007deffa18 R08: 000000006b6b6b6b R09: 0000000000000000
> > [  657.802181] R10: ffff880078801d00 R11: 000000000000002e R12: 6b6b6b6b6b6b6b6b
> > [  657.802185] R13: ffff88007f5720f8 R14: ffffffffa010e7a0 R15: 00000000ffffffea
> > [  657.802189] FS:  00007f3c23d75700(0000) GS:ffff88007b000000(0000) knlGS:0000000000000000
> > [  657.802194] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [  657.802198] CR2: 00007f27436e40f0 CR3: 000000007db4e000 CR4: 00000000000007f0
> > [  657.802201] Stack:
> > [  657.802204]  ffffffff8134fd0b 6b6b6b6b6b6b6b6b ffff88007f572060 ffff88007f5720f8
> > [  657.802211]  ffff88007deffa38 ffffffff813d2ca1 ffff88007d938058 ffff88007da01ca8
> > [  657.802217]  ffff88007deffa58 ffffffff813bdd6a ffff88007f572060 ffff88007da01ca8
> > [  657.802224] Call Trace:
> > [  657.802231]  [<ffffffff8134fd0b>] ? acpi_pci_irq_disable+0x3c/0x49
> > [  657.802237]  [<ffffffff813d2ca1>] device_unregister+0x11/0x20
> > [  657.802243]  [<ffffffff813bdd6a>] drm_sysfs_device_remove+0x1a/0x30
> > [  657.802249]  [<ffffffff813b9dbd>] drm_unplug_minor+0x1d/0x40
> > [  657.802255]  [<ffffffff813ba0cd>] drm_put_minor+0x3d/0x50
> > [  657.802260]  [<ffffffff813ba0f8>] drm_dev_free+0x18/0x80
> > [  657.802265]  [<ffffffff813bc67f>] drm_get_pci_dev+0xaf/0x150
> > [  657.802272]  [<ffffffff8131d8ce>] ? pcibios_set_master+0x5e/0x90
> > [  657.802315]  [<ffffffffa00a7eba>] nouveau_drm_probe+0x24a/0x290 [nouveau]
> > [  657.802321]  [<ffffffff8131f36c>] pci_device_probe+0x9c/0xf0
> > [  657.802328]  [<ffffffff813d6046>] driver_probe_device+0x76/0x240
> > [  657.802333]  [<ffffffff813d62ab>] __driver_attach+0x9b/0xa0
> > [  657.802339]  [<ffffffff813d6210>] ? driver_probe_device+0x240/0x240
> > [  657.802345]  [<ffffffff813d43b5>] bus_for_each_dev+0x55/0x90
> > [  657.802350]  [<ffffffff813d5b79>] driver_attach+0x19/0x20
> > [  657.802355]  [<ffffffff813d577c>] bus_add_driver+0x10c/0x210
> > [  657.802360]  [<ffffffffa0133000>] ? 0xffffffffa0132fff
> > [  657.802365]  [<ffffffff813d692f>] driver_register+0x5f/0xf0
> > [  657.802370]  [<ffffffffa0133000>] ? 0xffffffffa0132fff
> > [  657.802375]  [<ffffffff8131e697>] __pci_register_driver+0x47/0x50
> > [  657.802381]  [<ffffffff813bc835>] drm_pci_init+0x115/0x130
> > [  657.802386]  [<ffffffffa0133000>] ? 0xffffffffa0132fff
> > [  657.802390]  [<ffffffffa0133000>] ? 0xffffffffa0132fff
> > [  657.802414]  [<ffffffffa0133043>] nouveau_drm_init+0x43/0x1000 [nouveau]
> > [  657.802422]  [<ffffffff8100034a>] do_one_initcall+0x11a/0x170
> > [  657.802429]  [<ffffffff81071e33>] ? set_memory_nx+0x43/0x50
> > [  657.802435]  [<ffffffff8113a132>] ? __vunmap+0xb2/0x100
> > [  657.802441]  [<ffffffff810eeb26>] load_module+0x1966/0x21b0
> > [  657.802446]  [<ffffffff810ec070>] ? show_initstate+0x50/0x50
> > [  657.802453]  [<ffffffff8115bc94>] ? vfs_read+0x114/0x160
> > [  657.802458]  [<ffffffff810ef4a6>] SyS_finit_module+0x86/0x90
> > [  657.802465]  [<ffffffff817235e2>] system_call_fastpath+0x16/0x1b
> > [  657.802469] Code: 74 24 18 48 89 df e8 90 ff ff ff 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 66 90 55 48 89 e5 41 55 41 54 49 89 fc 53 48 83 ec 08 <48> 8b 87 88 00 00 00 4c 8b 2f 48 85 c0 74 1b 48 8b b8 90 00 00
> > [  657.802514] RIP  [<ffffffff813d2af0>] device_del+0x10/0x1b0
> > [  657.802520]  RSP <ffff88007deff9f8>
> > [  657.802524] ---[ end trace 11e780c61d88afaf ]---
> >
> > I'm booting with efi stub and SYSFB=y, FB_SIMPLE=y, DRM_NOUVEAU=m
> > Same config did boot properly with 3.12. Above output contains complete
> > output from the time of calling modprobe nouveau.
> 
> Hrm.... that is a separate bug that we should probably figure out.
> Looks like some use-after-free when nouveau fails to come up (note the
> poison 0x6b values in various registers). But the above patch will
> hopefully prevent that situation.

Yep, I enable SLUB poison on all my kernels with slub_debug=FZP

How much of the trace can be trusted as being real code and not some
remainder of non-overwritten data mis-parsed?

If it can be trusted, the point in nouveau_drm_probe() is within
alloc_apertures() which does not really make sense as efifb has already
been removed, thus we should see code happening after
remove_conflicting_framebuffers().

Probably SyS_finit_module() is the only relevant part of the stack-trace
and some module-assigned data has been double-freed/poisoned.

Thanks,
Bruno


More information about the dri-devel mailing list