General protection fault: RIP: 0010:free_block+0xdc/0x1f0

Dave Airlie airlied at gmail.com
Tue Sep 15 21:49:46 UTC 2020


cc'ing some more people.

On Tue, 15 Sep 2020 at 23:07, Paul Menzel <pmenzel at molgen.mpg.de> wrote:
>
> Dear Andrew folks, dear Linux folks,
>
>
> With Linux 5.9-rc4 on a Dell OptiPlex 5080 with Intel Core i7-10700 CPU
> @ 2.90GHz, and external
>
>      01:00.0 VGA compatible controller [0300]: Advanced Micro Devices,
> Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] [1002:6611] (rev 87)
>
> running graphical demanding applications glmark2 [1] and the Phoronix
> Test Suite [2] benchmark *pts/desktop-graphics* [3]
>
>      $ git describe --tags
>      v10.0.0m1-13-g0b5ddc3c0
>
> I got three general protection faults, and it restarted or froze (no
> input devices working, screen froze and even network card (no ping)).
>
> Here the system restarted itself:
>
> > kernel: general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI
> > kernel: CPU: 2 PID: 9702 Comm: glmark2 Kdump: loaded Not tainted 5.9.0-rc4.mx64.343 #1
> > kernel: Hardware name: Dell Inc. OptiPlex 5080/032W55, BIOS 1.1.7 08/17/2020
> > kernel: RIP: 0010:free_block+0xdc/0x1f0
>
> Here it froze:
>
> > [14639.665745] general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI
> > [14639.675917] CPU: 15 PID: 23094 Comm: pvpython Kdump: loaded Not tainted 5.9.0-rc4.mx64.343 #1
> > [14639.684431] Hardware name: Dell Inc. OptiPlex 5080/032W55, BIOS 1.1.7 08/17/2020
> > [14639.691823] RIP: 0010:free_block+0xdc/0x1f0
>
> Here it froze:
>
> > kernel: general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI
> > kernel: CPU: 15 PID: 23094 Comm: pvpython Kdump: loaded Not tainted 5.9.0-rc4.mx64.343 #1
> > kernel: Hardware name: Dell Inc. OptiPlex 5080/032W55, BIOS 1.1.7 08/17/2020
> > kernel: RIP: 0010:free_block+0xdc/0x1f0
>
> Running `scripts/decode_stacktrace.sh`:
>
> > linux-5.9_rc4-343.x86_64/source$ scripts/decode_stacktrace.sh vmlinux < optiplex-5080-linux-5.9-rc4-gp-pvpython.txt
> > [14528.718656] cgroup: fork rejected by pids controller in /user.slice/user-5272.slice/session-c6.scope
> > [14639.665745] general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI
> > [14639.675917] CPU: 15 PID: 23094 Comm: pvpython Kdump: loaded Not tainted 5.9.0-rc4.mx64.343 #1
> > [14639.684431] Hardware name: Dell Inc. OptiPlex 5080/032W55, BIOS 1.1.7 08/17/2020
> > [14639.691823] RIP: 0010:free_block (./include/linux/list.h:112 ./include/linux/list.h:135 ./include/linux/list.h:146 mm/slab.c:3336)
> > [14639.696006] Code: 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 4c 01 e8 48 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 48 08 48 8b 50 10 4c 8d 78 08 <48> 89 51 08 48 89 0a 4c 89 da 48 2b 50 28 4c 89 60 08 48 89 68 10
> > All code
> > ========
> >    0: 00 48 01                add    %cl,0x1(%rax)
> >    3: d0 48 c1                rorb   -0x3f(%rax)
> >    6: e8 0c 48 c1 e0          callq  0xffffffffe0c14817
> >    b: 06                      (bad)
> >    c: 4c 01 e8                add    %r13,%rax
> >    f: 48 8b 50 08             mov    0x8(%rax),%rdx
> >   13: 48 8d 4a ff             lea    -0x1(%rdx),%rcx
> >   17: 83 e2 01                and    $0x1,%edx
> >   1a: 48 0f 45 c1             cmovne %rcx,%rax
> >   1e: 48 8b 48 08             mov    0x8(%rax),%rcx
> >   22: 48 8b 50 10             mov    0x10(%rax),%rdx
> >   26: 4c 8d 78 08             lea    0x8(%rax),%r15
> >   2a:*        48 89 51 08             mov    %rdx,0x8(%rcx)           <-- trapping instruction
> >   2e: 48 89 0a                mov    %rcx,(%rdx)
> >   31: 4c 89 da                mov    %r11,%rdx
> >   34: 48 2b 50 28             sub    0x28(%rax),%rdx
> >   38: 4c 89 60 08             mov    %r12,0x8(%rax)
> >   3c: 48 89 68 10             mov    %rbp,0x10(%rax)
> >
> > Code starting with the faulting instruction
> > ===========================================
> >    0: 48 89 51 08             mov    %rdx,0x8(%rcx)
> >    4: 48 89 0a                mov    %rcx,(%rdx)
> >    7: 4c 89 da                mov    %r11,%rdx
> >    a: 48 2b 50 28             sub    0x28(%rax),%rdx
> >    e: 4c 89 60 08             mov    %r12,0x8(%rax)
> >   12: 48 89 68 10             mov    %rbp,0x10(%rax)
> > [14639.714747] RSP: 0018:ffffc9001c26fab8 EFLAGS: 00010046
> > [14639.719970] RAX: ffffea000d193600 RBX: 0000000080000000 RCX: dead000000000100
> > [14639.727099] RDX: dead000000000122 RSI: ffff88842d5f3ef0 RDI: ffff88842b440300
> > [14639.734225] RBP: dead000000000122 R08: ffffc9001c26fb30 R09: ffff88842b441280
> > [14639.741351] R10: 000000000000000f R11: ffff8883464d80c0 R12: dead000000000100
> > [14639.748477] R13: ffffea0000000000 R14: ffff88842d5f3ff0 R15: ffffea000d193608
> > [14639.755604] FS:  00007fd3b7e8f040(0000) GS:ffff88842d5c0000(0000) knlGS:0000000000000000
> > [14639.763692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [14639.769430] CR2: 00007fd344233548 CR3: 00000002f46aa003 CR4: 00000000007706e0
> > [14639.776556] PKRU: 55555554
> > [14639.779265] Call Trace:
> > [14639.781717] ___cache_free (mm/slab.c:3389 mm/slab.c:3455)
> > [14639.785463] kfree (./arch/x86/include/asm/irqflags.h:41 ./arch/x86/include/asm/irqflags.h:84 mm/slab.c:3757)
> > [14639.788432] kmem_freepages (mm/slab.h:266 mm/slab.h:437 mm/slab.c:1406)
> > [14639.792093] slab_destroy (mm/slab.c:1631)
> > [14639.795579] slabs_destroy (mm/slab.c:1639 (discriminator 12))
> > [14639.799152] ___cache_free (mm/slab.c:3406 mm/slab.c:3455)
> > [14639.802902] ? _cond_resched (kernel/sched/core.c:6123)
> > [14639.806650] kfree (./arch/x86/include/asm/irqflags.h:41 ./arch/x86/include/asm/irqflags.h:84 mm/slab.c:3757)
> > [14639.809644] amdgpu_vram_mgr_del (drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:439) amdgpu
> > [14639.814524] ttm_bo_cleanup_memtype_use (drivers/gpu/drm/ttm/ttm_bo.c:866 drivers/gpu/drm/ttm/ttm_bo.c:367) ttm
> > [14639.819748] ttm_bo_put (./include/linux/dma-resv.h:226 drivers/gpu/drm/ttm/ttm_bo.c:612 ./include/linux/kref.h:65 drivers/gpu/drm/ttm/ttm_bo.c:624) ttm
> > [14639.823768] amdgpu_bo_unref (drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:861) amdgpu
> > [14639.828313] amdgpu_vm_free_table (drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:953) amdgpu
> > [14639.833293] amdgpu_vm_free_pts (drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:975) amdgpu
> > [14639.838097] amdgpu_vm_fini (drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:3119) amdgpu
> > [14639.842727] amdgpu_driver_postclose_kms (drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:1116) amdgpu
> > [14639.848387] drm_file_free.part.9 (drivers/gpu/drm/drm_file.c:292) drm
> > [14639.853263] drm_release (./arch/x86/include/asm/atomic.h:123 ./include/asm-generic/atomic-instrumented.h:749 drivers/gpu/drm/drm_file.c:496) drm
> > [14639.857183] __fput (fs/file_table.c:282)
> > [14639.860238] task_work_run (kernel/task_work.c:143 (discriminator 1))
> > [14639.863811] exit_to_user_mode_prepare (./include/linux/tracehook.h:188 kernel/entry/common.c:163 kernel/entry/common.c:190)
> > [14639.868602] syscall_exit_to_user_mode (./arch/x86/include/asm/atomic.h:29 ./include/asm-generic/atomic-instrumented.h:28 ./include/linux/jump_label.h:254 ./arch/x86/include/asm/nospec-branch.h:288 ./arch/x86/include/asm/entry-common.h:80 kernel/entry/common.c:131 kernel/entry/common.c:267)
> > [14639.873304] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:125)
> > [14639.878353] RIP: 0033:0x7fd3d715cb5f
> > [14639.881925] Code: 20 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 bc fb ff ff 89 df 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 15 89 d7 89 44 24 0c e8 fe fb ff ff 8b 44 24
> > All code
> > ========
> >    0: 20 00                   and    %al,(%rax)
> >    2: f7 d8                   neg    %eax
> >    4: 64 89 02                mov    %eax,%fs:(%rdx)
> >    7: b8 ff ff ff ff          mov    $0xffffffff,%eax
> >    c: c3                      retq
> >    d: 66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
> >   13: 53                      push   %rbx
> >   14: 89 fb                   mov    %edi,%ebx
> >   16: 48 83 ec 10             sub    $0x10,%rsp
> >   1a: e8 bc fb ff ff          callq  0xfffffffffffffbdb
> >   1f: 89 df                   mov    %ebx,%edi
> >   21: 89 c2                   mov    %eax,%edx
> >   23: b8 03 00 00 00          mov    $0x3,%eax
> >   28: 0f 05                   syscall
> >   2a:*        48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax         <-- trapping instruction
> >   30: 77 15                   ja     0x47
> >   32: 89 d7                   mov    %edx,%edi
> >   34: 89 44 24 0c             mov    %eax,0xc(%rsp)
> >   38: e8 fe fb ff ff          callq  0xfffffffffffffc3b
> >   3d: 8b                      .byte 0x8b
> >   3e: 44                      rex.R
> >   3f: 24                      .byte 0x24
> >
> > Code starting with the faulting instruction
> > ===========================================
> >    0: 48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
> >    6: 77 15                   ja     0x1d
> >    8: 89 d7                   mov    %edx,%edi
> >    a: 89 44 24 0c             mov    %eax,0xc(%rsp)
> >    e: e8 fe fb ff ff          callq  0xfffffffffffffc11
> >   13: 8b                      .byte 0x8b
> >   14: 44                      rex.R
> >   15: 24                      .byte 0x24
> > [14639.900667] RSP: 002b:00007fff07ed2f40 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> > [14639.908229] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fd3d715cb5f
> > [14639.915354] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000008
> > [14639.922480] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000000e
> > [14639.929607] R10: 000000000000000c R11: 0000000000000293 R12: 0000000005168450
> > [14639.936732] R13: 0000000000000008 R14: 00000000007c8290 R15: 00007fff07ed31c0
> > [14639.943859] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs 8021q garp stp mrp llc snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio i915 amdgpu gpu_sched ttm input_leds x86_pkg_temp_thermal iosf_mbi led_class drm_kms_helper kvm_intel snd_hda_codec_hdmi drm snd_hda_intel intel_gtt snd_intel_dspcfg kvm fb_sys_fops syscopyarea snd_hda_codec snd_hda_core sysfillrect wmi_bmof sysimgblt snd_pcm irqbypass wmi snd_timer snd deflate iTCO_wdt soundcore iTCO_vendor_support crc32c_intel efi_pstore video pstore nfsd auth_rpcgss nfs_acl lockd grace sunrpc efivarfs ip_tables x_tables unix ipv6 autofs4
> > [14639.996237] ---[ end trace c4d9d5f7e4b117a6 ]---
> > [14640.705681] RIP: 0010:free_block (./include/linux/list.h:112 ./include/linux/list.h:135 ./include/linux/list.h:146 mm/slab.c:3336)
> > [14640.709874] Code: 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 4c 01 e8 48 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 48 08 48 8b 50 10 4c 8d 78 08 <48> 89 51 08 48 89 0a 4c 89 da 48 2b 50 28 4c 89 60 08 48 89 68 10
> > All code
> > ========
> >    0: 00 48 01                add    %cl,0x1(%rax)
> >    3: d0 48 c1                rorb   -0x3f(%rax)
> >    6: e8 0c 48 c1 e0          callq  0xffffffffe0c14817
> >    b: 06                      (bad)
> >    c: 4c 01 e8                add    %r13,%rax
> >    f: 48 8b 50 08             mov    0x8(%rax),%rdx
> >   13: 48 8d 4a ff             lea    -0x1(%rdx),%rcx
> >   17: 83 e2 01                and    $0x1,%edx
> >   1a: 48 0f 45 c1             cmovne %rcx,%rax
> >   1e: 48 8b 48 08             mov    0x8(%rax),%rcx
> >   22: 48 8b 50 10             mov    0x10(%rax),%rdx
> >   26: 4c 8d 78 08             lea    0x8(%rax),%r15
> >   2a:*        48 89 51 08             mov    %rdx,0x8(%rcx)           <-- trapping instruction
> >   2e: 48 89 0a                mov    %rcx,(%rdx)
> >   31: 4c 89 da                mov    %r11,%rdx
> >   34: 48 2b 50 28             sub    0x28(%rax),%rdx
> >   38: 4c 89 60 08             mov    %r12,0x8(%rax)
> >   3c: 48 89 68 10             mov    %rbp,0x10(%rax)
> >
> > Code starting with the faulting instruction
> > ===========================================
> >    0: 48 89 51 08             mov    %rdx,0x8(%rcx)
> >    4: 48 89 0a                mov    %rcx,(%rdx)
> >    7: 4c 89 da                mov    %r11,%rdx
> >    a: 48 2b 50 28             sub    0x28(%rax),%rdx
> >    e: 4c 89 60 08             mov    %r12,0x8(%rax)
> >   12: 48 89 68 10             mov    %rbp,0x10(%rax)
> > [14640.728612] RSP: 0018:ffffc9001c26fab8 EFLAGS: 00010046
> > [14640.733834] RAX: ffffea000d193600 RBX: 0000000080000000 RCX: dead000000000100
> > [14640.740962] RDX: dead000000000122 RSI: ffff88842d5f3ef0 RDI: ffff88842b440300
> > [14640.748092] RBP: dead000000000122 R08: ffffc9001c26fb30 R09: ffff88842b441280
> > [14640.755218] R10: 000000000000000f R11: ffff8883464d80c0 R12: dead000000000100
> > [14640.762348] R13: ffffea0000000000 R14: ffff88842d5f3ff0 R15: ffffea000d193608
> > [14640.769478] FS:  00007fd3b7e8f040(0000) GS:ffff88842d5c0000(0000) knlGS:0000000000000000
> > [14640.777558] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [14640.783327] CR2: 00007fd344233548 CR3: 00000002f46aa003 CR4: 00000000007706e0
> > [14640.790476] PKRU: 55555554
> > [14661.818409] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> > [14661.824340] rcu:     6-...0: (1 GPs behind) idle=83a/1/0x4000000000000000 softirq=545426/545427 fqs=1448
> > [14661.833636]  (detected by 10, t=21025 jiffies, g=3736877, q=2158)
> > [14661.839726] Task dump for CPU 6:
> > [14661.842952] task:kworker/6:2     state:R  running task     stack:    0 pid: 7383 ppid:     2 flags:0x00004008
> > [14661.852856] Workqueue: events cache_reap
> > [14661.856779] Call Trace:
> > [14661.859230] ? cache_reap (mm/slab.c:3978)
> > [14661.862804] ? process_one_work (./arch/x86/include/asm/atomic.h:29 ./include/asm-generic/atomic-instrumented.h:28 ./include/linux/jump_label.h:254 ./include/linux/jump_label.h:264 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2274)
> > [14661.866987] ? cancel_delayed_work (kernel/workqueue.c:2358)
> > [14661.871254] ? worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2416)
> > [14661.875087] ? cancel_delayed_work (kernel/workqueue.c:2358)
> > [14661.879354] ? kthread (kernel/kthread.c:292)
> > [14661.882756] ? kthread_use_mm (kernel/kthread.c:245)
> > [14661.886589] ? ret_from_fork (arch/x86/entry/entry_64.S:294)
> > [14726.905632] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> > [14726.911561] rcu:     6-...0: (1 GPs behind) idle=83a/1/0x4000000000000000 softirq=545426/545427 fqs=1735
> > [14726.920856]  (detected by 10, t=86112 jiffies, g=3736877, q=3398)
> > [14726.926946] Task dump for CPU 6:
> > [14726.930172] task:kworker/6:2     state:R  running task     stack:    0 pid: 7383 ppid:     2 flags:0x00004008
> > [14726.940076] Workqueue: events cache_reap
> > [14726.943994] Call Trace:
> > [14726.946445] ? cache_reap (mm/slab.c:3978)
> > [14726.950019] ? process_one_work (./arch/x86/include/asm/atomic.h:29 ./include/asm-generic/atomic-instrumented.h:28 ./include/linux/jump_label.h:254 ./include/linux/jump_label.h:264 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2274)
> > [14726.954203] ? cancel_delayed_work (kernel/workqueue.c:2358)
> > [14726.958470] ? worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2416)
> > [14726.962307] ? cancel_delayed_work (kernel/workqueue.c:2358)
> > [14726.966575] ? kthread (kernel/kthread.c:292)
> > [14726.969976] ? kthread_use_mm (kernel/kthread.c:245)
> > [14726.973809] ? ret_from_fork (arch/x86/entry/entry_64.S:294)
>
> Is that a known issue? Reproducing the problem often takes several
> hours, so some guidance on what to try would be great.
>
>
> Kind regards,
>
> Paul
>
>
> [1]: https://github.com/glmark2/glmark2
> [2]: https://phoronix-test-suite.com/
> [3]: https://openbenchmarking.org/suite/pts/desktop-graphics


More information about the dri-devel mailing list