[Intel-gfx] Oops at shutdown in intel_unpin_fb_obj()

Linus Torvalds torvalds at linux-foundation.org
Sun Jan 29 19:42:32 UTC 2017


Guys, I've gotten absolutely no response to this, and the problem
seems to still occur.

I just got a slightly different hang at shutdown, due to a kernel oops
that seems related. It's not identical - the call trace is very
different - but it's close.

In particular, it's once again the same NULL pointer dereference in
"intel_unpin_fb_obj()", except this time it looked like this:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
  IP: intel_unpin_fb_obj+0x69/0xe0 [i915]
  Oops: 0000 [#1] SMP
  Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_nat ebtable_broute bridge stp llc ip6ta$
   tpm_tis industrialio tpm_tis_core acpi_pad tpm nfsd auth_rpcgss
nfs_acl lockd grace sunrpc dm_crypt hid_logitech_hidpp hid_logitech_dj
i915 crct10dif_pclmul i2c_algo_bit crc32_pc$
  CPU: 4 PID: 26173 Comm: kworker/u16:9 Tainted: G        W
4.10.0-rc5-00111-g49e555a932de #1
  Hardware name: System manufacturer System Product Name/Z170-K, BIOS
1803 05/06/2016
  Workqueue: i915 intel_unpin_work_fn [i915]
  RIP: 0010:intel_unpin_fb_obj+0x69/0xe0 [i915]
  RSP: 0000:ffffb95c4937bdc0 EFLAGS: 00010286
  RAX: 0000000000000000 RBX: ffff96f284441340 RCX: 0000000000000000
  RDX: ffffb95c4937bdc0 RSI: ffff96f29f273908 RDI: ffff96f284441340
  RBP: ffffb95c4937be08 R08: 0000000000000000 R09: 0000000000000000
  R10: 00000000fa83b2da R11: 0000000000808111 R12: ffff96f20d878500
  R13: 0000000000000001 R14: ffff96f29f58c400 R15: ffff96f29f270068
  FS:  0000000000000000(0000) GS:ffff96f2b6d00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000078 CR3: 000000041ff4b000 CR4: 00000000003406e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   intel_unpin_work_fn+0x58/0x140 [i915]
   process_one_work+0x1f1/0x480
   worker_thread+0x48/0x4d0
   kthread+0x101/0x140
   ret_from_fork+0x29/0x40
  Code: ff ff ff 74 67 48 8d 7d b8 44 89 ea 4c 89 e6 e8 ce 2c ff ff 48
8b 43 08 48 8d 55 b8 48 89 df 48 8d b0 08 39 00 00 e8 47 1b fc ff <48>
8b 50 78 48 85 d2 74 04 83 6a 20 01 48 $
  RIP: intel_unpin_fb_obj+0x69/0xe0 [i915] RSP: ffffb95c4937bdc0
  CR2: 0000000000000078
  ---[ end trace afab57e9d299b42b ]---

so this time it was the worker thread that died and took the system
down with it.

Anyway, there is something *seriously* wrong with the i915 shutdown sequence.

Now, maybe this was fixed with the recent drm pull that did have some
i915 fixes in it, and I wasn't running on my desktop yet, but nothing
there looks very obvious.

And once again, I'd like to note that other users of
i915_gem_object_to_ggtt() do seem to check for a NULL vma, while
intel_unpin_fb_obj() simply passes any potential NULL vma to
i915_vma_unpin_fence().

Guys?

                       Linus


On Sun, Jan 8, 2017 at 3:35 PM, Linus Torvalds
<torvalds at linux-foundation.org> wrote:
> This has so far only happened once, so I don't know how repeatable it
> is, but here goes..
>
> My nice stable XPS13 just oopsed on shutdown. It is possibly related
> to the X server SIGSEGV'ing too, although honestly, I am not sure
> which caused which. Maybe the kernel oops caused the X problem. They
> definitely happened together, and happened as I was shutting down the
> machine.
>
> I'm including the syslog for the Xorg issue too, in case it ends up
> giving people ideas, but the kernel oops is what I actually looked at.
> The code decodes to
>
>         74 67                   je     0x69
>         48 8d 7d b8             lea    -0x48(%rbp),%rdi
>         44 89 ea                mov    %r13d,%edx
>         4c 89 e6                mov    %r12,%rsi
>         e8 3e 2d ff ff          callq  ..
>         48 8b 43 08             mov    0x8(%rbx),%rax
>         48 8d 55 b8             lea    -0x48(%rbp),%rdx
>         48 89 df                mov    %rbx,%rdi
>         48 8d b0 08 39 00 00    lea    0x3908(%rax),%rsi
>         e8 47 1a fc ff          callq  ..
> *       48 8b 50 78             mov    0x78(%rax),%rdx          <--
> trapping instruction
>         48 85 d2                test   %rdx,%rdx
>         74 04                   je     0x35
>         83 6a 20 01             subl   $0x1,0x20(%rdx)
>         48 89 c7                mov    %rax,%rdi
>         e8 c2 60 fc ff          callq  ..
>
>
> and just comparing it to the generted code it seems to be this:
>
>         call    i915_gem_obj_to_vma     #
>         movq    120(%rax), %rdx # MEM[(struct drm_i915_fence_reg *
> *)_24 + 120B], _15
>
> where %rax (the return value from i915_gem_obj_to_vma()) is NULL.
>
> So it seems to be this code:
>
>         ...
>         vma = i915_gem_object_to_ggtt(obj, &view);
>
>         i915_vma_unpin_fence(vma);
>         i915_gem_object_unpin_from_display_plane(vma);
>         ...
>
> where vma is NULL.
>
> The other user of i915_gem_object_to_ggtt() does have a test of !vma,
> although with a warning. Which implies it does happen, but shouldn't.
> Maybe consistent with the Xorg confusion?
>
>                         Linus
>
> ---
>
>   gdm-x-session: (II) UnloadModule: "libinput"
>   gdm-x-session: (II) systemd-logind: releasing fd for 13:72
>   gdm-x-session: (II) UnloadModule: "libinput"
>   gdm-x-session: (II) systemd-logind: releasing fd for 13:78
>   gdm-x-session: (II) UnloadModule: "libinput"
>   gdm-x-session: (II) systemd-logind: releasing fd for 13:66
>   gdm-x-session: (II) UnloadModule: "libinput"
>   gdm-x-session: (II) systemd-logind: releasing fd for 13:65
>   gdm-x-session: (II) UnloadModule: "libinput"
>   gdm-x-session: (II) systemd-logind: releasing fd for 13:69
>   gdm-x-session: (II) UnloadModule: "libinput"
>   gdm-x-session: (II) systemd-logind: releasing fd for 13:67
>   gdm-x-session: (EE)
>   gdm-x-session: (EE) Backtrace:
>   gdm-x-session: (EE) 0: /usr/libexec/Xorg (OsLookupColor+0x139) [0x59f859]
>   gdm-x-session: (EE) 1: /lib64/libc.so.6 (__restore_rt+0x0) [0x7fe554e5a7df]
>   gdm-x-session: (EE) 2: /usr/lib64/xorg/modules/libfb.so
> (_fbGetWindowPixmap+0xd) [0x7fe54d16b6fd]
>   gdm-x-session: (EE) 3: /usr/libexec/Xorg
> (present_extension_init+0x5b7) [0x51b9b7]
>   gdm-x-session: (EE) 4: /usr/libexec/Xorg
> (present_extension_init+0x685) [0x51bb95]
>   gdm-x-session: (EE) 5: /usr/libexec/Xorg
> (present_extension_init+0xdf2) [0x51ca62]
>   gdm-x-session: (EE) 6: /usr/libexec/Xorg (AddTraps+0x9133) [0x523973]
>   gdm-x-session: (EE) 7: /usr/libexec/Xorg
> (CompositeRegisterImplicitRedirectionException+0x4098) [0x4ccf58]
>   gdm-x-session: (EE) 8: /usr/libexec/Xorg (AddTraps+0x73f4) [0x51fe84]
>   gdm-x-session: (EE) 9: /usr/libexec/Xorg (remove_fs_handlers+0x581) [0x43af61]
>   gdm-x-session: (EE) 10: /lib64/libc.so.6 (__libc_start_main+0xf1)
> [0x7fe554e46731]
>   gdm-x-session: (EE) 11: /usr/libexec/Xorg (_start+0x29) [0x424d59]
>   gdm-x-session: (EE) 12: ? (?+0x29) [0x29]
>   gdm-x-session: (EE)
>   gdm-x-session: (EE) Segmentation fault at address 0x10
>   gdm-x-session: (EE)
>   gdm-x-session: Fatal server error:
>   gdm-x-session: (EE) Caught signal 11 (Segmentation fault). Server aborting
>   gdm-x-session: (EE)
>   gdm-x-session: (EE)
>   gdm-x-session: Please consult the Fedora Project support
>   gdm-x-session:          at http://wiki.x.org
>   gdm-x-session:  for help.
>   gdm-x-session: (EE) Please also check the log file at
> "/home/torvalds/.local/share/xorg/Xorg.0.log" for additional
> information.
>   gdm-x-session: (EE)
>   gdm-x-session: (WW) xf86CloseConsole: KDSETMODE failed: Input/output error
>   gdm-x-session: (WW) xf86CloseConsole: VT_GETMODE failed: Input/output error
>   gdm-x-session: (WW) xf86CloseConsole: VT_ACTIVATE failed: Input/output error
>
>   kernel: BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000078
>    IP: intel_unpin_fb_obj+0x69/0xe0 [i915]
>    PGD 0
>    Oops: 0000 [#1] SMP
>    Modules linked in: rfcomm fuse ccm ip6t_rpfilter ip6t_REJECT
> nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat
> ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6
> nf_defrag_ipv6 nf_nat_ipv6 ip6table_security ip6table_mangle
> ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_security iptable_mangle iptable_raw
> ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep vfat fat
> arc4 snd_hda_codec_hdmi dell_led snd_soc_skl intel_rapl iTCO_wdt
> snd_soc_skl_ipc x86_pkg_temp_thermal intel_powerclamp snd_soc_sst_ipc
> snd_hda_codec_realtek coretemp snd_hda_codec_generic snd_soc_sst_dsp
> snd_hda_ext_core snd_soc_sst_match snd_soc_core
> i2c_designware_platform i2c_designware_core kvm_intel iwlmvm dell_wmi
> snd_hda_intel kvm snd_hda_codec
>     snd_hwdep mac80211 snd_hda_core snd_seq irqbypass snd_seq_device
> intel_cstate dell_laptop intel_rapl_perf dell_smbios snd_pcm dcdbas
> iwlwifi rtsx_pci_ms snd_timer memstick snd cfg80211 soundcore i2c_i801
> joydev shpchp btusb btrtl mei_me idma64 processor_thermal_device mei
> intel_lpss_pci intel_soc_dts_iosf intel_pch_thermal wmi hci_uart btbcm
> btqca btintel bluetooth acpi_als pinctrl_sunrisepoint kfifo_buf
> intel_lpss_acpi pinctrl_intel rfkill int3403_thermal industrialio
> intel_lpss int340x_thermal_zone acpi_pad intel_hid tpm_tis
> int3400_thermal tpm_tis_core acpi_thermal_rel sparse_keymap tpm nfsd
> auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt hid_multitouch
> rtsx_pci_sdmmc mmc_core crct10dif_pclmul i915 crc32_pclmul
> crc32c_intel ghash_clmulni_intel i2c_algo_bit serio_raw drm_kms_helper
>     syscopyarea nvme sysfillrect nvme_core rtsx_pci sysimgblt
> fb_sys_fops drm i2c_hid video fjes
>    CPU: 0 PID: 5083 Comm: systemd-logind Not tainted
> 4.10.0-rc2-00103-g4cf184638bcf #38
>    Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.4.12 11/30/2016
>    task: ffff8d8fe8af8000 task.stack: ffffb5e4c2388000
>    RIP: 0010:intel_unpin_fb_obj+0x69/0xe0 [i915]
>    RSP: 0018:ffffb5e4c238b7e0 EFLAGS: 00010282
>    RAX: 0000000000000000 RBX: ffff8d8fab64e100 RCX: ffff8d8fab64e101
>    RDX: ffffb5e4c238b7e0 RSI: ffff8d8fe77eb908 RDI: ffff8d8fab64e100
>    RBP: ffffb5e4c238b828 R08: 0000000000000000 R09: 0000000000000000
>    R10: 0000000000000007 R11: 00000000000000bf R12: ffff8d8fc64d5900
>    R13: 0000000000000001 R14: ffff8d8fe7f6b540 R15: ffff8d8f9c6d6c00
>    FS:  00007f7f18786900(0000) GS:ffff8d8ffec00000(0000) knlGS:0000000000000000
>    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>    CR2: 0000000000000078 CR3: 000000046a72f000 CR4: 00000000003406f0
>    Call Trace:
>     intel_cleanup_plane_fb+0x5b/0xa0 [i915]
>     drm_atomic_helper_cleanup_planes+0x6f/0x90 [drm_kms_helper]
>     intel_atomic_commit_tail+0x749/0xfe0 [i915]
>     intel_atomic_commit+0x3cb/0x4f0 [i915]
>     drm_atomic_commit+0x4b/0x50 [drm]
>     restore_fbdev_mode+0x14c/0x2a0 [drm_kms_helper]
>     drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper]
>     drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper]
>     intel_fbdev_set_par+0x18/0x70 [i915]
>     fb_set_var+0x236/0x460
>     fbcon_blank+0x30f/0x350
>     do_unblank_screen+0xd2/0x1a0
>     vt_ioctl+0x507/0x12a0
>     tty_ioctl+0x355/0xc30
>     do_vfs_ioctl+0xa3/0x5e0
>     SyS_ioctl+0x79/0x90
>     entry_SYSCALL_64_fastpath+0x13/0x94
>    RIP: 0033:0x7f7f17850ce7
>    RSP: 002b:00007ffe696d9bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>    RAX: ffffffffffffffda RBX: 000000000000001a RCX: 00007f7f17850ce7
>    RDX: 0000000000000000 RSI: 0000000000004b3a RDI: 0000000000000015
>    RBP: 00007f7f187866c8 R08: 00000016170f1200 R09: 0000000000000009
>    R10: 0000000000000075 R11: 0000000000000246 R12: 0000000000000000
>    R13: 0000000000000001 R14: 000055f66b267790 R15: 000055f66b25e190
>    Code: ff ff ff 74 67 48 8d 7d b8 44 89 ea 4c 89 e6 e8 3e 2d ff ff
> 48 8b 43 08 48 8d 55 b8 48 89 df 48 8d b0 08 39 00 00 e8 47 1a fc ff
> <48> 8b 50 78 48 85 d2 74 04 83 6a 20 01 48 89 c7 e8 c2 60 fc ff
>    RIP: intel_unpin_fb_obj+0x69/0xe0 [i915] RSP: ffffb5e4c238b7e0
>    CR2: 0000000000000078
>    ---[ end trace daf415d61b7a5042 ]---


More information about the Intel-gfx mailing list