[Nouveau] BUG: unable to handle page fault for address nouveau_fence_new
Alan J. Wylie
alan at wylie.me.uk
Wed Aug 12 18:07:53 UTC 2020
Another two spontaneous reboots today. Latest one occured whilst I was
away from the computer, output is below. A different call trace this
time. No response to my previous report, so adding some e-mail addresses
from get_maintainer.pl / git blame.
It's an old graphics card, and this time I note references to RAM and
memory. Is there any possibility it's hardware? Is there a GPU
equivalent to memtest86+ ?
On Tue, 28 Jul 2020, "Alan J. Wylie" <alan at wylie.me.uk> writes:
> I've had several recent crashes of the nouveau kernel driver over the past
> month or so.
>
> My suspicion is that Firefox is causing it.
>
> The screen goes black and then the computer reboots.
>
> Nothing much in the syslogs, however I've managed to get netconsole output.
>
> It happens very infrequently and I'm afraid I don't know how to reproduce it,
> however I'll be more than happy to help by providing more information or
> debugging.
>
> Hardware:
> 01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GT 640] (rev a1)
>
> Kernel:
> Linux frodo 5.7.10 #21 SMP PREEMPT Wed Jul 22 13:01:11 BST 2020 x86_64 AMD FX(tm)-8350 Eight-Core Processor AuthenticAMD GNU/Linux
>
> Software:
> Recent Gentoo
> Nightly Firefox.
>
> [I] media-libs/mesa (20.0.8 at 04/07/20): OpenGL-like graphic library for Linux
> [I] x11-apps/mesa-progs (8.4.0 at 07/04/19): Mesa's OpenGL utility and demo programs (glxgears and glxinfo)
> [I] x11-drivers/xf86-video-nouveau (1.0.16 at 17/06/20): Accelerated Open Source driver for nVidia cards
> [I] x11-base/xorg-server (1.20.8-r1(0/1.20.8)@22/07/20): X.Org X servers
>
netconsole:
BUG: unable to handle page fault for address: 000000010050786b
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 1084 Comm: X Not tainted 5.8.1 #25
Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F12 05/30/2012
RIP: 0010:__kmalloc+0xb1/0x2c0
Code: 89 c8 65 48 03 05 3f 29 df 53 48 8b 70 08 48 39 f2 75 e7 4c 8b 28 4d 85 ed 0f 84 d8 00 00 00 41 8b 47 20 49 8b 3f 48 8d 4a 08 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b9 41 8b
RSP: 0018:ffff976e40eb7910 EFLAGS: 00010202
RAX: 0000000000000030 RBX: 0000000000000000 RCX: 0000000001ab8932
RDX: 0000000001ab892a RSI: 0000000001ab892a RDI: 0000000000028a20
RBP: 0000000000000cc0 R08: 000000000000001a R09: 000000000000001a
R10: ffff8deed1efc090 R11: 000000000011b18f R12: 0000000000000052
R13: 000000010050783b R14: ffff8def75c07480 R15: ffff8def75c07480
FS: 00007f71b5e96dc0(0000) GS:ffff8def76c80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000010050786b CR3: 0000000227119000 CR4: 00000000000406e0
Call Trace:
nvif_object_init+0x7c/0x160 [nouveau]
nvif_mem_init_type+0xc8/0x1b0 [nouveau]
? nvkm_vram_map+0x56/0x80 [nouveau]
? nvkm_uvmm_mthd+0x794/0x7c0 [nouveau]
? nvkm_vmm_get_locked+0x37f/0x540 [nouveau]
nouveau_mem_vram+0xf1/0x1a0 [nouveau]
nouveau_vram_manager_new+0x91/0xd0 [nouveau]
ttm_bo_mem_space+0xd7/0x320 [ttm]
ttm_bo_validate+0x12e/0x1a0 [ttm]
? drm_vma_offset_add+0x41/0x90 [drm]
? nv10_bo_put_tile_region+0x90/0x90 [nouveau]
ttm_bo_init_reserved+0x2ad/0x320 [ttm]
ttm_bo_init+0x89/0x100 [ttm]
? nv10_bo_put_tile_region+0x90/0x90 [nouveau]
nouveau_bo_init+0xc1/0xf0 [nouveau]
? nv10_bo_put_tile_region+0x90/0x90 [nouveau]
nouveau_gem_new+0xcf/0x120 [nouveau]
? nouveau_gem_new+0x120/0x120 [nouveau]
nouveau_gem_ioctl_new+0x67/0xf0 [nouveau]
? nouveau_gem_new+0x120/0x120 [nouveau]
drm_ioctl_kernel+0xcc/0x110 [drm]
drm_ioctl+0x202/0x390 [drm]
? nouveau_gem_new+0x120/0x120 [nouveau]
nouveau_drm_ioctl+0x91/0xd0 [nouveau]
ksys_ioctl+0xa4/0xd0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x3e/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f71b5568dd7
Code: 00 00 90 48 8b 05 a9 40 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 79 40 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007fff1a291988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fff1a2919d0 RCX: 00007f71b5568dd7
RDX: 00007fff1a2919d0 RSI: 00000000c0306480 RDI: 000000000000000a
RBP: 00000000c0306480 R08: 0000000000000000 R09: 00005575014822e0
R10: 00007f71b562d9e0 R11: 0000000000000246 R12: 00007fff1a2919d0
R13: 000000000000000a R14: 0000557500582e00 R15: 0000000000000000
Modules linked in: essiv authenc dm_crypt binfmt_misc netconsole configfs sha256_generic libsha256 cfg80211 8021q veth cpuid i2c_dev asus_atk0110 acpi_power_meter it87 hwmon_vid nouveau af_packet bridge stp evdev mxm_wmi llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic video snd_hda_intel ttm snd_intel_dspcfg drm_kms_helper snd_hda_codec snd_hda_core kvm_amd kvm snd_pcm syscopyarea snd_timer sysfillrect fam15h_power k10temp sysimgblt snd irqbypass fb_sys_fops soundcore i2c_piix4 wmi acpi_cpufreq softdog nfs nfsd auth_rpcgss lockd grace drm sunrpc drm_panel_orientation_quirks backlight agpgart usbhid ohci_pci ghash_clmulni_intel cryptd ehci_pci ohci_hcd sr_mod ehci_hcd cdrom xhci_pci xhci_hcd usbcore usb_common 8250 8250_base serial_core
CR2: 000000010050786b
---[ end trace 67649d0c2234e455 ]---
RIP: 0010:__kmalloc+0xb1/0x2c0
Code: 89 c8 65 48 03 05 3f 29 df 53 48 8b 70 08 48 39 f2 75 e7 4c 8b 28 4d 85 ed 0f 84 d8 00 00 00 41 8b 47 20 49 8b 3f 48 8d 4a 08 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b9 41 8b
RSP: 0018:ffff976e40eb7910 EFLAGS: 00010202
RAX: 0000000000000030 RBX: 0000000000000000 RCX: 0000000001ab8932
RDX: 0000000001ab892a RSI: 0000000001ab892a RDI: 0000000000028a20
RBP: 0000000000000cc0 R08: 000000000000001a R09: 000000000000001a
R10: ffff8deed1efc090 R11: 000000000011b18f R12: 0000000000000052
R13: 000000010050783b R14: ffff8def75c07480 R15: ffff8def75c07480
FS: 00007f71b5e96dc0(0000) GS:ffff8def76c80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000010050786b CR3: 0000000227119000 CR4: 00000000000406e0
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x2b000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Rebooting in 20 seconds..
And passed through decode_stacktrace.sh
# uname -a
Linux frodo 5.8.1 #25 SMP PREEMPT Tue Aug 11 19:47:00 BST 2020 x86_64 AMD FX(tm)-8350 Eight-Core Processor AuthenticAMD GNU/Linux
# /work/src.git/linux-stable/scripts/decode_stacktrace.sh /work/src.git/linux-stable/arch/x86/boot/compressed/vmlinux /work/src.git/linux-stable/ /lib/modules/5.8.1 < ~alan/nouveau/bug.001
BUG: unable to handle page fault for address: 000000010050786b
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 1084 Comm: X Not tainted 5.8.1 #25
Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F12 05/30/2012
RIP: 0010:__kmalloc (??:?)
Code: 89 c8 65 48 03 05 3f 29 df 53 48 8b 70 08 48 39 f2 75 e7 4c 8b 28 4d 85 ed 0f 84 d8 00 00 00 41 8b 47 20 49 8b 3f 48 8d 4a 08 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b9 41 8b
All code
========
0: 89 c8 mov %ecx,%eax
2: 65 48 03 05 3f 29 df add %gs:0x53df293f(%rip),%rax # 0x53df2949
9: 53
a: 48 8b 70 08 mov 0x8(%rax),%rsi
e: 48 39 f2 cmp %rsi,%rdx
11: 75 e7 jne 0xfffffffffffffffa
13: 4c 8b 28 mov (%rax),%r13
16: 4d 85 ed test %r13,%r13
19: 0f 84 d8 00 00 00 je 0xf7
1f: 41 8b 47 20 mov 0x20(%r15),%eax
23: 49 8b 3f mov (%r15),%rdi
26: 48 8d 4a 08 lea 0x8(%rdx),%rcx
2a:* 49 8b 5c 05 00 mov 0x0(%r13,%rax,1),%rbx <-- trapping instruction
2f: 4c 89 e8 mov %r13,%rax
32: 65 48 0f c7 0f cmpxchg16b %gs:(%rdi)
37: 0f 94 c0 sete %al
3a: 84 c0 test %al,%al
3c: 74 b9 je 0xfffffffffffffff7
3e: 41 rex.B
3f: 8b .byte 0x8b
Code starting with the faulting instruction
===========================================
0: 49 8b 5c 05 00 mov 0x0(%r13,%rax,1),%rbx
5: 4c 89 e8 mov %r13,%rax
8: 65 48 0f c7 0f cmpxchg16b %gs:(%rdi)
d: 0f 94 c0 sete %al
10: 84 c0 test %al,%al
12: 74 b9 je 0xffffffffffffffcd
14: 41 rex.B
15: 8b .byte 0x8b
RSP: 0018:ffff976e40eb7910 EFLAGS: 00010202
RAX: 0000000000000030 RBX: 0000000000000000 RCX: 0000000001ab8932
RDX: 0000000001ab892a RSI: 0000000001ab892a RDI: 0000000000028a20
RBP: 0000000000000cc0 R08: 000000000000001a R09: 000000000000001a
R10: ffff8deed1efc090 R11: 000000000011b18f R12: 0000000000000052
R13: 000000010050783b R14: ffff8def75c07480 R15: ffff8def75c07480
FS: 00007f71b5e96dc0(0000) GS:ffff8def76c80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000010050786b CR3: 0000000227119000 CR4: 00000000000406e0
Call Trace:
nvif_object_init (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nvif/object.c:279) nouveau
nvif_mem_init_type (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nvif/mem.c:72) nouveau
? nvkm_vram_map (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ram.c:49) nouveau
? nvkm_uvmm_mthd (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c:218 /work/src.git/linux-stable/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c:340) nouveau
? nvkm_vmm_get_locked (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c:1769 (discriminator 4)) nouveau
nouveau_mem_vram (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_mem.c:155) nouveau
nouveau_vram_manager_new (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_ttm.c:76 /work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_ttm.c:59) nouveau
ttm_bo_mem_space (/work/src.git/linux-stable/drivers/gpu/drm/ttm/ttm_bo.c:1068) ttm
ttm_bo_validate (/work/src.git/linux-stable/drivers/gpu/drm/ttm/ttm_bo.c:1142 /work/src.git/linux-stable/drivers/gpu/drm/ttm/ttm_bo.c:1218) ttm
? drm_vma_offset_add (/work/src.git/linux-stable/drivers/gpu/drm/drm_vma_manager.c:215) drm
? nv10_bo_put_tile_region (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_bo.c:134) nouveau
ttm_bo_init_reserved (/work/src.git/linux-stable/drivers/gpu/drm/ttm/ttm_bo.c:1335) ttm
ttm_bo_init (/work/src.git/linux-stable/drivers/gpu/drm/ttm/ttm_bo.c:1369) ttm
? nv10_bo_put_tile_region (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_bo.c:134) nouveau
nouveau_bo_init (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_bo.c:317) nouveau
? nv10_bo_put_tile_region (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_bo.c:134) nouveau
nouveau_gem_new (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_gem.c:203) nouveau
? nouveau_gem_new (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_gem.c:263) nouveau
nouveau_gem_ioctl_new (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_gem.c:272) nouveau
? nouveau_gem_new (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_gem.c:263) nouveau
drm_ioctl_kernel (/work/src.git/linux-stable/drivers/gpu/drm/drm_ioctl.c:793) drm
drm_ioctl (/work/src.git/linux-stable/./include/linux/thread_info.h:119 /work/src.git/linux-stable/./include/linux/thread_info.h:152 /work/src.git/linux-stable/./include/linux/uaccess.h:151 /work/src.git/linux-stable/drivers/gpu/drm/drm_ioctl.c:888) drm
? nouveau_gem_new (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_gem.c:263) nouveau
nouveau_drm_ioctl (/work/src.git/linux-stable/drivers/gpu/drm/nouveau/nouveau_drm.c:1120) nouveau
ksys_ioctl (??:?)
__x64_sys_ioctl (??:?)
do_syscall_64 (??:?)
entry_SYSCALL_64_after_hwframe (??:?)
RIP: 0033:0x7f71b5568dd7
Code: 00 00 90 48 8b 05 a9 40 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 79 40 0c 00 f7 d8 64 89 01 48
All code
========
0: 00 00 add %al,(%rax)
2: 90 nop
3: 48 8b 05 a9 40 0c 00 mov 0xc40a9(%rip),%rax # 0xc40b3
a: 64 c7 00 26 00 00 00 movl $0x26,%fs:(%rax)
11: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
18: c3 retq
19: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
20: 00 00 00
23: b8 10 00 00 00 mov $0x10,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 retq
33: 48 8b 0d 79 40 0c 00 mov 0xc4079(%rip),%rcx # 0xc40b3
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 retq
9: 48 8b 0d 79 40 0c 00 mov 0xc4079(%rip),%rcx # 0xc4089
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W
RSP: 002b:00007fff1a291988 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fff1a2919d0 RCX: 00007f71b5568dd7
RDX: 00007fff1a2919d0 RSI: 00000000c0306480 RDI: 000000000000000a
RBP: 00000000c0306480 R08: 0000000000000000 R09: 00005575014822e0
R10: 00007f71b562d9e0 R11: 0000000000000246 R12: 00007fff1a2919d0
R13: 000000000000000a R14: 0000557500582e00 R15: 0000000000000000
Modules linked in: essiv authenc dm_crypt binfmt_misc netconsole configfs sha256_generic libsha256 cfg80211 8021q veth cpuid i2c_dev asus_atk0110 acpi_power_meter it87 hwmon_vid nouveau af_packet bridge stp evdev mxm_wmi llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic video snd_hda_intel ttm snd_intel_dspcfg drm_kms_helper snd_hda_codec snd_hda_core kvm_amd kvm snd_pcm syscopyarea snd_timer sysfillrect fam15h_power k10temp sysimgblt snd irqbypass fb_sys_fops soundcore i2c_piix4 wmi acpi_cpufreq softdog nfs nfsd auth_rpcgss lockd grace drm sunrpc drm_panel_orientation_quirks backlight agpgart usbhid ohci_pci ghash_clmulni_intel cryptd ehci_pci ohci_hcd sr_mod ehci_hcd cdrom xhci_pci xhci_hcd usbcore usb_common 8250 8250_base serial_core
CR2: 000000010050786b
---[ end trace 67649d0c2234e455 ]---
RIP: 0010:__kmalloc (??:?)
Code: 89 c8 65 48 03 05 3f 29 df 53 48 8b 70 08 48 39 f2 75 e7 4c 8b 28 4d 85 ed 0f 84 d8 00 00 00 41 8b 47 20 49 8b 3f 48 8d 4a 08 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b9 41 8b
All code
========
0: 89 c8 mov %ecx,%eax
2: 65 48 03 05 3f 29 df add %gs:0x53df293f(%rip),%rax # 0x53df2949
9: 53
a: 48 8b 70 08 mov 0x8(%rax),%rsi
e: 48 39 f2 cmp %rsi,%rdx
11: 75 e7 jne 0xfffffffffffffffa
13: 4c 8b 28 mov (%rax),%r13
16: 4d 85 ed test %r13,%r13
19: 0f 84 d8 00 00 00 je 0xf7
1f: 41 8b 47 20 mov 0x20(%r15),%eax
23: 49 8b 3f mov (%r15),%rdi
26: 48 8d 4a 08 lea 0x8(%rdx),%rcx
2a:* 49 8b 5c 05 00 mov 0x0(%r13,%rax,1),%rbx <-- trapping instruction
2f: 4c 89 e8 mov %r13,%rax
32: 65 48 0f c7 0f cmpxchg16b %gs:(%rdi)
37: 0f 94 c0 sete %al
3a: 84 c0 test %al,%al
3c: 74 b9 je 0xfffffffffffffff7
3e: 41 rex.B
3f: 8b .byte 0x8b
Code starting with the faulting instruction
===========================================
0: 49 8b 5c 05 00 mov 0x0(%r13,%rax,1),%rbx
5: 4c 89 e8 mov %r13,%rax
8: 65 48 0f c7 0f cmpxchg16b %gs:(%rdi)
d: 0f 94 c0 sete %al
10: 84 c0 test %al,%al
12: 74 b9 je 0xffffffffffffffcd
14: 41 rex.B
15: 8b .byte 0x8b
RSP: 0018:ffff976e40eb7910 EFLAGS: 00010202
RAX: 0000000000000030 RBX: 0000000000000000 RCX: 0000000001ab8932
RDX: 0000000001ab892a RSI: 0000000001ab892a RDI: 0000000000028a20
RBP: 0000000000000cc0 R08: 000000000000001a R09: 000000000000001a
R10: ffff8deed1efc090 R11: 000000000011b18f R12: 0000000000000052
R13: 000000010050783b R14: ffff8def75c07480 R15: ffff8def75c07480
FS: 00007f71b5e96dc0(0000) GS:ffff8def76c80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000010050786b CR3: 0000000227119000 CR4: 00000000000406e0
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x2b000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Rebooting in 20 seconds..
--
Alan J. Wylie https://www.wylie.me.uk/
Dance like no-one's watching. / Encrypt like everyone is.
Security is inversely proportional to convenience
More information about the Nouveau
mailing list