[Intel-gfx] Oops at shutdown in intel_unpin_fb_obj()

Maarten Lankhorst maarten.lankhorst at linux.intel.com
Mon Jan 30 12:18:32 UTC 2017


Op 30-01-17 om 10:38 schreef Daniel Vetter:
> On Sun, Jan 29, 2017 at 11:42:32AM -0800, Linus Torvalds wrote:
>> Guys, I've gotten absolutely no response to this, and the problem
>> seems to still occur.
>>
>> I just got a slightly different hang at shutdown, due to a kernel oops
>> that seems related. It's not identical - the call trace is very
>> different - but it's close.
>>
>> In particular, it's once again the same NULL pointer dereference in
>> "intel_unpin_fb_obj()", except this time it looked like this:
>>
>>   BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
>>   IP: intel_unpin_fb_obj+0x69/0xe0 [i915]
>>   Oops: 0000 [#1] SMP
>>   Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE
>> nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ip6ta$
>>    tpm_tis industrialio tpm_tis_core acpi_pad tpm nfsd auth_rpcgss
>> nfs_acl lockd grace sunrpc dm_crypt hid_logitech_hidpp hid_logitech_dj
>> i915 crct10dif_pclmul i2c_algo_bit crc32_pc$
>>   CPU: 4 PID: 26173 Comm: kworker/u16:9 Tainted: G        W
>> 4.10.0-rc5-00111-g49e555a932de #1
>>   Hardware name: System manufacturer System Product Name/Z170-K, BIOS
>> 1803 05/06/2016
>>   Workqueue: i915 intel_unpin_work_fn [i915]
>>   RIP: 0010:intel_unpin_fb_obj+0x69/0xe0 [i915]
>>   RSP: 0000:ffffb95c4937bdc0 EFLAGS: 00010286
>>   RAX: 0000000000000000 RBX: ffff96f284441340 RCX: 0000000000000000
>>   RDX: ffffb95c4937bdc0 RSI: ffff96f29f273908 RDI: ffff96f284441340
>>   RBP: ffffb95c4937be08 R08: 0000000000000000 R09: 0000000000000000
>>   R10: 00000000fa83b2da R11: 0000000000808111 R12: ffff96f20d878500
>>   R13: 0000000000000001 R14: ffff96f29f58c400 R15: ffff96f29f270068
>>   FS:  0000000000000000(0000) GS:ffff96f2b6d00000(0000) knlGS:0000000000000000
>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   CR2: 0000000000000078 CR3: 000000041ff4b000 CR4: 00000000003406e0
>>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>   Call Trace:
>>    intel_unpin_work_fn+0x58/0x140 [i915]
>>    process_one_work+0x1f1/0x480
>>    worker_thread+0x48/0x4d0
>>    kthread+0x101/0x140
>>    ret_from_fork+0x29/0x40
>>   Code: ff ff ff 74 67 48 8d 7d b8 44 89 ea 4c 89 e6 e8 ce 2c ff ff 48
>> 8b 43 08 48 8d 55 b8 48 89 df 48 8d b0 08 39 00 00 e8 47 1b fc ff <48>
>> 8b 50 78 48 85 d2 74 04 83 6a 20 01 48 $
>>   RIP: intel_unpin_fb_obj+0x69/0xe0 [i915] RSP: ffffb95c4937bdc0
>>   CR2: 0000000000000078
>>   ---[ end trace afab57e9d299b42b ]---
>>
>> so this time it was the worker thread that died and took the system
>> down with it.
>>
>> Anyway, there is something *seriously* wrong with the i915 shutdown sequence.
>>
>> Now, maybe this was fixed with the recent drm pull that did have some
>> i915 fixes in it, and I wasn't running on my desktop yet, but nothing
>> there looks very obvious.
>>
>> And once again, I'd like to note that other users of
>> i915_gem_object_to_ggtt() do seem to check for a NULL vma, while
>> intel_unpin_fb_obj() simply passes any potential NULL vma to
>> i915_vma_unpin_fence().
>>
>> Guys?
> Hm, fell through the cracks somehow :( It's the vma tracking mixup, which
> is properly fixed for 4.11. We're not handling the different flavours of
> gpu mappings correctly, so if you mix tiling (because of the partial mmap
> stuff we've enabled recently) and rotation and stuff it eventually goes
> boom. The trouble is that the proper fix also involves core drm modeset
> changes, and lots of small shuffling in i915, so no way material for
> -fixes. We're discussing on irc what could be done, one option might be to
> disable the partial mmap stuff again to hide the bug as well as before
> (trading in some userspace faults resulting in your compositor blowing up
> in corner cases, but older bugs win in no-regression land). Or we shrug it
> off as unlikely and accept the leak and make the WARN_ON you added silent
> for 4.10.
> -Daniel

Here are the required patches backported to fix it in 4.10.

There's also a nasty double free in drm_atomic_ioctl which needs a separate fix.

~Maarten

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-drm-atomic-Unconditionally-call-prepare_fb.patch
Type: text/x-patch
Size: 2468 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20170130/77459b9e/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-drm-i915-Track-pinned-vma-in-intel_plane_state.patch
Type: text/x-patch
Size: 21968 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20170130/77459b9e/attachment-0003.bin>


More information about the Intel-gfx mailing list