[Intel-gfx] Oops at shutdown in intel_unpin_fb_obj()
Linus Torvalds
torvalds at linux-foundation.org
Sun Jan 29 19:42:32 UTC 2017
Guys, I've gotten absolutely no response to this, and the problem
seems to still occur.
I just got a slightly different hang at shutdown, due to a kernel oops
that seems related. It's not identical - the call trace is very
different - but it's close.
In particular, it's once again the same NULL pointer dereference in
"intel_unpin_fb_obj()", except this time it looked like this:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: intel_unpin_fb_obj+0x69/0xe0 [i915]
Oops: 0000 [#1] SMP
Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_nat ebtable_broute bridge stp llc ip6ta$
tpm_tis industrialio tpm_tis_core acpi_pad tpm nfsd auth_rpcgss
nfs_acl lockd grace sunrpc dm_crypt hid_logitech_hidpp hid_logitech_dj
i915 crct10dif_pclmul i2c_algo_bit crc32_pc$
CPU: 4 PID: 26173 Comm: kworker/u16:9 Tainted: G W
4.10.0-rc5-00111-g49e555a932de #1
Hardware name: System manufacturer System Product Name/Z170-K, BIOS
1803 05/06/2016
Workqueue: i915 intel_unpin_work_fn [i915]
RIP: 0010:intel_unpin_fb_obj+0x69/0xe0 [i915]
RSP: 0000:ffffb95c4937bdc0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff96f284441340 RCX: 0000000000000000
RDX: ffffb95c4937bdc0 RSI: ffff96f29f273908 RDI: ffff96f284441340
RBP: ffffb95c4937be08 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000fa83b2da R11: 0000000000808111 R12: ffff96f20d878500
R13: 0000000000000001 R14: ffff96f29f58c400 R15: ffff96f29f270068
FS: 0000000000000000(0000) GS:ffff96f2b6d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000078 CR3: 000000041ff4b000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
intel_unpin_work_fn+0x58/0x140 [i915]
process_one_work+0x1f1/0x480
worker_thread+0x48/0x4d0
kthread+0x101/0x140
ret_from_fork+0x29/0x40
Code: ff ff ff 74 67 48 8d 7d b8 44 89 ea 4c 89 e6 e8 ce 2c ff ff 48
8b 43 08 48 8d 55 b8 48 89 df 48 8d b0 08 39 00 00 e8 47 1b fc ff <48>
8b 50 78 48 85 d2 74 04 83 6a 20 01 48 $
RIP: intel_unpin_fb_obj+0x69/0xe0 [i915] RSP: ffffb95c4937bdc0
CR2: 0000000000000078
---[ end trace afab57e9d299b42b ]---
so this time it was the worker thread that died and took the system
down with it.
Anyway, there is something *seriously* wrong with the i915 shutdown sequence.
Now, maybe this was fixed with the recent drm pull that did have some
i915 fixes in it, and I wasn't running on my desktop yet, but nothing
there looks very obvious.
And once again, I'd like to note that other users of
i915_gem_object_to_ggtt() do seem to check for a NULL vma, while
intel_unpin_fb_obj() simply passes any potential NULL vma to
i915_vma_unpin_fence().
Guys?
Linus
On Sun, Jan 8, 2017 at 3:35 PM, Linus Torvalds
<torvalds at linux-foundation.org> wrote:
> This has so far only happened once, so I don't know how repeatable it
> is, but here goes..
>
> My nice stable XPS13 just oopsed on shutdown. It is possibly related
> to the X server SIGSEGV'ing too, although honestly, I am not sure
> which caused which. Maybe the kernel oops caused the X problem. They
> definitely happened together, and happened as I was shutting down the
> machine.
>
> I'm including the syslog for the Xorg issue too, in case it ends up
> giving people ideas, but the kernel oops is what I actually looked at.
> The code decodes to
>
> 74 67 je 0x69
> 48 8d 7d b8 lea -0x48(%rbp),%rdi
> 44 89 ea mov %r13d,%edx
> 4c 89 e6 mov %r12,%rsi
> e8 3e 2d ff ff callq ..
> 48 8b 43 08 mov 0x8(%rbx),%rax
> 48 8d 55 b8 lea -0x48(%rbp),%rdx
> 48 89 df mov %rbx,%rdi
> 48 8d b0 08 39 00 00 lea 0x3908(%rax),%rsi
> e8 47 1a fc ff callq ..
> * 48 8b 50 78 mov 0x78(%rax),%rdx <--
> trapping instruction
> 48 85 d2 test %rdx,%rdx
> 74 04 je 0x35
> 83 6a 20 01 subl $0x1,0x20(%rdx)
> 48 89 c7 mov %rax,%rdi
> e8 c2 60 fc ff callq ..
>
>
> and just comparing it to the generted code it seems to be this:
>
> call i915_gem_obj_to_vma #
> movq 120(%rax), %rdx # MEM[(struct drm_i915_fence_reg *
> *)_24 + 120B], _15
>
> where %rax (the return value from i915_gem_obj_to_vma()) is NULL.
>
> So it seems to be this code:
>
> ...
> vma = i915_gem_object_to_ggtt(obj, &view);
>
> i915_vma_unpin_fence(vma);
> i915_gem_object_unpin_from_display_plane(vma);
> ...
>
> where vma is NULL.
>
> The other user of i915_gem_object_to_ggtt() does have a test of !vma,
> although with a warning. Which implies it does happen, but shouldn't.
> Maybe consistent with the Xorg confusion?
>
> Linus
>
> ---
>
> gdm-x-session: (II) UnloadModule: "libinput"
> gdm-x-session: (II) systemd-logind: releasing fd for 13:72
> gdm-x-session: (II) UnloadModule: "libinput"
> gdm-x-session: (II) systemd-logind: releasing fd for 13:78
> gdm-x-session: (II) UnloadModule: "libinput"
> gdm-x-session: (II) systemd-logind: releasing fd for 13:66
> gdm-x-session: (II) UnloadModule: "libinput"
> gdm-x-session: (II) systemd-logind: releasing fd for 13:65
> gdm-x-session: (II) UnloadModule: "libinput"
> gdm-x-session: (II) systemd-logind: releasing fd for 13:69
> gdm-x-session: (II) UnloadModule: "libinput"
> gdm-x-session: (II) systemd-logind: releasing fd for 13:67
> gdm-x-session: (EE)
> gdm-x-session: (EE) Backtrace:
> gdm-x-session: (EE) 0: /usr/libexec/Xorg (OsLookupColor+0x139) [0x59f859]
> gdm-x-session: (EE) 1: /lib64/libc.so.6 (__restore_rt+0x0) [0x7fe554e5a7df]
> gdm-x-session: (EE) 2: /usr/lib64/xorg/modules/libfb.so
> (_fbGetWindowPixmap+0xd) [0x7fe54d16b6fd]
> gdm-x-session: (EE) 3: /usr/libexec/Xorg
> (present_extension_init+0x5b7) [0x51b9b7]
> gdm-x-session: (EE) 4: /usr/libexec/Xorg
> (present_extension_init+0x685) [0x51bb95]
> gdm-x-session: (EE) 5: /usr/libexec/Xorg
> (present_extension_init+0xdf2) [0x51ca62]
> gdm-x-session: (EE) 6: /usr/libexec/Xorg (AddTraps+0x9133) [0x523973]
> gdm-x-session: (EE) 7: /usr/libexec/Xorg
> (CompositeRegisterImplicitRedirectionException+0x4098) [0x4ccf58]
> gdm-x-session: (EE) 8: /usr/libexec/Xorg (AddTraps+0x73f4) [0x51fe84]
> gdm-x-session: (EE) 9: /usr/libexec/Xorg (remove_fs_handlers+0x581) [0x43af61]
> gdm-x-session: (EE) 10: /lib64/libc.so.6 (__libc_start_main+0xf1)
> [0x7fe554e46731]
> gdm-x-session: (EE) 11: /usr/libexec/Xorg (_start+0x29) [0x424d59]
> gdm-x-session: (EE) 12: ? (?+0x29) [0x29]
> gdm-x-session: (EE)
> gdm-x-session: (EE) Segmentation fault at address 0x10
> gdm-x-session: (EE)
> gdm-x-session: Fatal server error:
> gdm-x-session: (EE) Caught signal 11 (Segmentation fault). Server aborting
> gdm-x-session: (EE)
> gdm-x-session: (EE)
> gdm-x-session: Please consult the Fedora Project support
> gdm-x-session: at http://wiki.x.org
> gdm-x-session: for help.
> gdm-x-session: (EE) Please also check the log file at
> "/home/torvalds/.local/share/xorg/Xorg.0.log" for additional
> information.
> gdm-x-session: (EE)
> gdm-x-session: (WW) xf86CloseConsole: KDSETMODE failed: Input/output error
> gdm-x-session: (WW) xf86CloseConsole: VT_GETMODE failed: Input/output error
> gdm-x-session: (WW) xf86CloseConsole: VT_ACTIVATE failed: Input/output error
>
> kernel: BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000078
> IP: intel_unpin_fb_obj+0x69/0xe0 [i915]
> PGD 0
> Oops: 0000 [#1] SMP
> Modules linked in: rfcomm fuse ccm ip6t_rpfilter ip6t_REJECT
> nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat
> ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6
> nf_defrag_ipv6 nf_nat_ipv6 ip6table_security ip6table_mangle
> ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_security iptable_mangle iptable_raw
> ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep vfat fat
> arc4 snd_hda_codec_hdmi dell_led snd_soc_skl intel_rapl iTCO_wdt
> snd_soc_skl_ipc x86_pkg_temp_thermal intel_powerclamp snd_soc_sst_ipc
> snd_hda_codec_realtek coretemp snd_hda_codec_generic snd_soc_sst_dsp
> snd_hda_ext_core snd_soc_sst_match snd_soc_core
> i2c_designware_platform i2c_designware_core kvm_intel iwlmvm dell_wmi
> snd_hda_intel kvm snd_hda_codec
> snd_hwdep mac80211 snd_hda_core snd_seq irqbypass snd_seq_device
> intel_cstate dell_laptop intel_rapl_perf dell_smbios snd_pcm dcdbas
> iwlwifi rtsx_pci_ms snd_timer memstick snd cfg80211 soundcore i2c_i801
> joydev shpchp btusb btrtl mei_me idma64 processor_thermal_device mei
> intel_lpss_pci intel_soc_dts_iosf intel_pch_thermal wmi hci_uart btbcm
> btqca btintel bluetooth acpi_als pinctrl_sunrisepoint kfifo_buf
> intel_lpss_acpi pinctrl_intel rfkill int3403_thermal industrialio
> intel_lpss int340x_thermal_zone acpi_pad intel_hid tpm_tis
> int3400_thermal tpm_tis_core acpi_thermal_rel sparse_keymap tpm nfsd
> auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt hid_multitouch
> rtsx_pci_sdmmc mmc_core crct10dif_pclmul i915 crc32_pclmul
> crc32c_intel ghash_clmulni_intel i2c_algo_bit serio_raw drm_kms_helper
> syscopyarea nvme sysfillrect nvme_core rtsx_pci sysimgblt
> fb_sys_fops drm i2c_hid video fjes
> CPU: 0 PID: 5083 Comm: systemd-logind Not tainted
> 4.10.0-rc2-00103-g4cf184638bcf #38
> Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.12 11/30/2016
> task: ffff8d8fe8af8000 task.stack: ffffb5e4c2388000
> RIP: 0010:intel_unpin_fb_obj+0x69/0xe0 [i915]
> RSP: 0018:ffffb5e4c238b7e0 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: ffff8d8fab64e100 RCX: ffff8d8fab64e101
> RDX: ffffb5e4c238b7e0 RSI: ffff8d8fe77eb908 RDI: ffff8d8fab64e100
> RBP: ffffb5e4c238b828 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000007 R11: 00000000000000bf R12: ffff8d8fc64d5900
> R13: 0000000000000001 R14: ffff8d8fe7f6b540 R15: ffff8d8f9c6d6c00
> FS: 00007f7f18786900(0000) GS:ffff8d8ffec00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000078 CR3: 000000046a72f000 CR4: 00000000003406f0
> Call Trace:
> intel_cleanup_plane_fb+0x5b/0xa0 [i915]
> drm_atomic_helper_cleanup_planes+0x6f/0x90 [drm_kms_helper]
> intel_atomic_commit_tail+0x749/0xfe0 [i915]
> intel_atomic_commit+0x3cb/0x4f0 [i915]
> drm_atomic_commit+0x4b/0x50 [drm]
> restore_fbdev_mode+0x14c/0x2a0 [drm_kms_helper]
> drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper]
> drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper]
> intel_fbdev_set_par+0x18/0x70 [i915]
> fb_set_var+0x236/0x460
> fbcon_blank+0x30f/0x350
> do_unblank_screen+0xd2/0x1a0
> vt_ioctl+0x507/0x12a0
> tty_ioctl+0x355/0xc30
> do_vfs_ioctl+0xa3/0x5e0
> SyS_ioctl+0x79/0x90
> entry_SYSCALL_64_fastpath+0x13/0x94
> RIP: 0033:0x7f7f17850ce7
> RSP: 002b:00007ffe696d9bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 000000000000001a RCX: 00007f7f17850ce7
> RDX: 0000000000000000 RSI: 0000000000004b3a RDI: 0000000000000015
> RBP: 00007f7f187866c8 R08: 00000016170f1200 R09: 0000000000000009
> R10: 0000000000000075 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000001 R14: 000055f66b267790 R15: 000055f66b25e190
> Code: ff ff ff 74 67 48 8d 7d b8 44 89 ea 4c 89 e6 e8 3e 2d ff ff
> 48 8b 43 08 48 8d 55 b8 48 89 df 48 8d b0 08 39 00 00 e8 47 1a fc ff
> <48> 8b 50 78 48 85 d2 74 04 83 6a 20 01 48 89 c7 e8 c2 60 fc ff
> RIP: intel_unpin_fb_obj+0x69/0xe0 [i915] RSP: ffffb5e4c238b7e0
> CR2: 0000000000000078
> ---[ end trace daf415d61b7a5042 ]---
More information about the Intel-gfx
mailing list