drm-next-misc merge breaks vmwgfx

Daniel Vetter daniel.vetter at ffwll.ch
Thu Apr 6 19:52:17 UTC 2017


On Thu, Apr 6, 2017 at 8:01 PM, Thomas Hellstrom <thellstrom at vmware.com> wrote:
> On 04/06/2017 04:46 PM, Daniel Vetter wrote:
>> On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom <thellstrom at vmware.com> wrote:
>>> On 04/06/2017 02:34 PM, Daniel Vetter wrote:
>>>> Hi Thomas,
>>>>
>>>> Bisected an offender already? Afaik there's no one else who reported
>>>> issues thus far, and for our own CI it seems all still fine.
>>>> -Daniel
>>> Hi, Daniel,
>>>
>>> Yes, I rebased drm-misc-next on top of vmwgfx-next and found the culprit
>>> to be
>>>
>>> 38b6441e "drm/atomic-helper: Remove the backoff hack from set_config.."
>>>
>>> Reverting first 1fa4da04 and then
>>> 38b6441e
>>>
>>> fixes the problem.
>> Yeah, we seem to have a solid functional conflict between the vmwgfx
>> atomic conversion, and the changes in drm-misc-next. Preliminary
>> analysis, but I think what's going on is:
>> - With the above changes in -misc we punt the deadlock retry loop to
>> the callers of ->set_config.
>> - But since it would have been way too invasive, I only fixed up the
>> atomic callers (in most places we have special paths for atomic and
>> non-atomic due to slightly different semantics), which means for
>> legacy functions we in some cases pass a NULL ctx down to
>> ->set_config. But since legacy paths only get called on legacy
>> drivers, no problem.
>> - Well except I've done that audit before vmwgfx became atomic, and
>> that audit is now wrong, and I've forgotten to properly re-audit when
>> the conflicts happened all around. But since I half-expect to hit a
>> mid-driver conversion with this I did sprinkle
>> WARN_ON(drm_drv_uses_atomic_modeset()) over all these paths.
>>
>> So assuming this is correct, you should see a pile of WARN_ON
>> backtraces that you're hitting in the atomic-vmwgfx+drm-misc-next
>> combo. The proper fix would be to switch over to atomic primitives for
>> all these cases. On a quick look I see some in the vmwgfx fbdev
>> emulation code, might even be worth it to check whether we could reuse
>> the core helpers (which do this split handling alread) in some cases.
>>
>> Cheers, Daniel
>
> So with the two reverts previously mentioned applied, I see the
> following. Is this consistent with the above.
>
> FWIW I did a pretty big vmwgfx fbdev rewrite some time ago, but at that
> time we didn't have the callbacks
> necessary to use the helpers. Maybe that has changed with the atomic
> implementation.
>
> Considering that Sinclair just had a baby, I'm not 100% sure though,
> that I have time to fix this up in the vmwgfx driver for this merge
> window...
>
> /Thomas
>
>
> [    9.547101] WARNING: CPU: 3 PID: 359 at
> drivers/gpu/drm/drm_modeset_lock.c:107 drm_modeset_lock_all+0xb8/0xc0 [drm]
> [    9.547102] Modules linked in: snd_rawmidi snd_timer
> ghash_clmulni_intel intel_rapl_perf ppdev snd_seq_device vmw_balloon snd
> rfkill joydev soundcore nfit parport_pc parport acpi_cpufreq tpm_tis
> tpm_tis_core tpm shpchp vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl
> lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi
> scsi_transport_spi mptscsih crc32c_intel e1000 mptbase ata_generic
> serio_raw pata_acpi uas usb_storage
> [    9.547122] CPU: 3 PID: 359 Comm: plymouthd Tainted: G        W
> 4.11.0-rc4+ #2
> [    9.547122] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
> Desktop Reference Platform, BIOS 6.00 01/24/2017
> [    9.547123] Call Trace:
> [    9.547128]  dump_stack+0x63/0x86
> [    9.547130]  __warn+0xcb/0xf0
> [    9.547131]  warn_slowpath_null+0x1d/0x20
> [    9.547137]  drm_modeset_lock_all+0xb8/0xc0 [drm]
> [    9.547143]  vmw_framebuffer_dmabuf_dirty+0x4c/0x200 [vmwgfx]
> [    9.547145]  ? __check_object_size+0x100/0x19d
> [    9.547152]  drm_mode_dirtyfb_ioctl+0x178/0x1a0 [drm]
> [    9.547158]  drm_ioctl+0x209/0x4c0 [drm]
> [    9.547164]  ? drm_mode_getfb+0x100/0x100 [drm]
> [    9.547165]  ? __do_fault+0x1e/0x110
> [    9.547169]  vmw_generic_ioctl+0x193/0x2d0 [vmwgfx]
> [    9.547175]  ? drm_getunique+0xa0/0xa0 [drm]
> [    9.547179]  vmw_unlocked_ioctl+0x15/0x20 [vmwgfx]
> [    9.547180]  do_vfs_ioctl+0xa3/0x5f0
> [    9.547181]  SyS_ioctl+0x79/0x90
> [    9.547182]  do_syscall_64+0x67/0x180
> [    9.547184]  entry_SYSCALL64_slow_path+0x25/0x25
> [    9.547185] RIP: 0033:0x7fd4c93b7787
> [    9.547186] RSP: 002b:00007fff17d06b88 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [    9.547187] RAX: ffffffffffffffda RBX: 0000000000000c80 RCX:
> 00007fd4c93b7787
> [    9.547187] RDX: 00007fff17d06bc0 RSI: 00000000c01864b1 RDI:
> 0000000000000009
> [    9.547188] RBP: 00007fff17d06bc0 R08: 00007fd4c7554000 R09:
> 00007fd4ca1e9010
> [    9.547188] R10: 0000558ffe14ca40 R11: 0000000000000246 R12:
> 00000000c01864b1
> [    9.547188] R13: 0000000000000009 R14: 0000000000000000 R15:
> 0000000000000258
> [    9.547190] ---[ end trace 46a3554c8816a28b ]---

This is an artifact of the two reverts, I've forgotten to properly
clear config->acquire_ctx again in the intermediate states.

>     4.824456] WARNING: CPU: 2 PID: 359 at drivers/gpu/drm/drm_crtc.c:499
> drm_mode_set_config_internal+0x40/0x50 [drm]
> [    4.824457] Modules linked in: vmwgfx drm_kms_helper ttm drm mptspi
> scsi_transport_spi mptscsih crc32c_intel e1000(+) mptbase ata_generic
> serio_raw pata_acpi uas usb_storage
> [    4.824467] CPU: 2 PID: 359 Comm: plymouthd Tainted: G        W
> 4.11.0-rc4+ #2
> [    4.824468] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
> Desktop Reference Platform, BIOS 6.00 01/24/2017
> [    4.824468] Call Trace:
> [    4.824474]  dump_stack+0x63/0x86
> [    4.824476]  __warn+0xcb/0xf0
> [    4.824477]  warn_slowpath_null+0x1d/0x20
> [    4.824483]  drm_mode_set_config_internal+0x40/0x50 [drm]
> [    4.824492]  vmw_fb_set_par+0x269/0x580 [vmwgfx]
> [    4.824494]  ? selinux_capable+0x20/0x30
> [    4.824498]  ? ttm_mem_global_reserve.constprop.6+0xd6/0x100 [ttm]
> [    4.824503]  vmw_fb_on+0x24/0x60 [vmwgfx]
> [    4.824506]  vmw_master_drop+0x81/0xc0 [vmwgfx]
> [    4.824511]  drm_drop_master+0x21/0x50 [drm]
> [    4.824516]  drm_dropmaster_ioctl+0x6c/0x70 [drm]
> [    4.824521]  drm_ioctl+0x209/0x4c0 [drm]
> [    4.824526]  ? drm_setmaster_ioctl+0xa0/0xa0 [drm]
> [    4.824528]  ? do_filp_open+0xa5/0x100
> [    4.824532]  vmw_generic_ioctl+0x193/0x2d0 [vmwgfx]
> [    4.824537]  ? drm_getunique+0xa0/0xa0 [drm]
> [    4.824541]  vmw_unlocked_ioctl+0x15/0x20 [vmwgfx]
> [    4.824543]  do_vfs_ioctl+0xa3/0x5f0
> [    4.824544]  SyS_ioctl+0x79/0x90
> [    4.824545]  do_syscall_64+0x67/0x180
> [    4.824547]  entry_SYSCALL64_slow_path+0x25/0x25
> [    4.824548] RIP: 0033:0x7fd4c93b7787
> [    4.824549] RSP: 002b:00007fff17d06d98 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [    4.824550] RAX: ffffffffffffffda RBX: 0000558ffe145260 RCX:
> 00007fd4c93b7787
> [    4.824550] RDX: 0000000000000000 RSI: 000000000000641f RDI:
> 0000000000000009
> [    4.824551] RBP: 0000000000000000 R08: 00007fd4c967ab98 R09:
> 0000000000000005
> [    4.824551] R10: 0000558ffe145390 R11: 0000000000000246 R12:
> 000000000000641f
> [    4.824552] R13: 0000000000000009 R14: 00007fd4c9da78e0 R15:
> 0000000000000000
> [    4.824553] ---[ end trace 46a3554c8816a28a ]---

Yeah, this is the "don't do that" case that I expected.

>    19.720064] WARNING: CPU: 0 PID: 1316 at
> drivers/gpu/drm/drm_modeset_lock.c:107 drm_modeset_lock_all+0xb8/0xc0 [drm]
> [   19.720065] Modules linked in: xt_CHECKSUM ipt_MASQUERADE
> nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
> nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat
> ip6table_security ip6table_raw ip6table_mangle ip6table_nat
> nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 iptable_security
> iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
> nf_nat_ipv4 nf_nat nf_conntrack libcrc32c ebtable_filter ebtables
> ip6table_filter ip6_tables vmw_vsock_vmci_transport vsock bnep
> snd_seq_midi snd_seq_midi_event snd_ens1371 gameport snd_ac97_codec
> crct10dif_pclmul ac97_bus btusb btrtl btbcm btintel snd_seq bluetooth
> snd_pcm crc32_pclmul snd_rawmidi snd_timer ghash_clmulni_intel
> intel_rapl_perf ppdev snd_seq_device
> [   19.720091]  vmw_balloon snd rfkill joydev soundcore nfit parport_pc
> parport acpi_cpufreq tpm_tis tpm_tis_core tpm shpchp vmw_vmci i2c_piix4
> nfsd auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm
> drm mptspi scsi_transport_spi mptscsih crc32c_intel e1000 mptbase
> ata_generic serio_raw pata_acpi uas usb_storage
> [   19.720106] CPU: 0 PID: 1316 Comm: Xorg Tainted: G        W
> 4.11.0-rc4+ #2
> [   19.720107] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
> Desktop Reference Platform, BIOS 6.00 01/24/2017
> [   19.720107] Call Trace:
> [   19.720113]  dump_stack+0x63/0x86
> [   19.720115]  __warn+0xcb/0xf0
> [   19.720116]  warn_slowpath_null+0x1d/0x20
> [   19.720123]  drm_modeset_lock_all+0xb8/0xc0 [drm]
> [   19.720129]  drm_mode_gamma_set_ioctl+0x3a/0x180 [drm]
> [   19.720134]  drm_ioctl+0x209/0x4c0 [drm]
> [   19.720140]  ? drm_mode_crtc_set_gamma_size+0xa0/0xa0 [drm]
> [   19.720151]  ? add_wait_queue+0x65/0x80
> [   19.720158]  vmw_generic_ioctl+0x193/0x2d0 [vmwgfx]
> [   19.720163]  ? drm_getunique+0xa0/0xa0 [drm]
> [   19.720167]  vmw_unlocked_ioctl+0x15/0x20 [vmwgfx]
> [   19.720169]  do_vfs_ioctl+0xa3/0x5f0
> [   19.720170]  ? sk_prot_alloc+0x5/0x120
> [   19.720171]  SyS_ioctl+0x79/0x90
> [   19.720173]  entry_SYSCALL_64_fastpath+0x1a/0xa9
> [   19.720174] RIP: 0033:0x7f9eb9f24787
> [   19.720175] RSP: 002b:00007ffd90012b88 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [   19.720176] RAX: ffffffffffffffda RBX: 000000000222ffe0 RCX:
> 00007f9eb9f24787
> [   19.720176] RDX: 00007ffd90012bc0 RSI: 00000000c02064a5 RDI:
> 000000000000000c
> [   19.720176] RBP: 00007f9eba1e43c0 R08: 0000000002130fb0 R09:
> 00000000021311b0
> [   19.720177] R10: 0000000000000088 R11: 0000000000000246 R12:
> 0000000000000000
> [   19.720177] R13: 00007f9ebc6822a8 R14: 00007f9eb9f9b5e0 R15:
> 00007ffd9000eeb0
> [   19.720179] ---[ end trace 46a3554c8816a293 ]---
> [   31.611886] systemd-journald[600]: File
> /var/log/journal/fbbc68aec3984fd6b148a9830a1096e0/user-2000.journal
> corrupted or uncleanly shut down, renaming and replacing.
> [   31.937861] ------------[ cut here ]------------

This is again the leaked acquire_ctx that isn't properly cleared due
to your reverts (well, my not-perfectly-bisectable patches).

I think it should be simple to type up a quick patch to make the
vmwgfx fbdev code work again, I'll submit that asap.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the dri-devel mailing list