drm-next-misc merge breaks vmwgfx
Daniel Vetter
daniel.vetter at ffwll.ch
Thu Apr 6 19:52:17 UTC 2017
On Thu, Apr 6, 2017 at 8:01 PM, Thomas Hellstrom <thellstrom at vmware.com> wrote:
> On 04/06/2017 04:46 PM, Daniel Vetter wrote:
>> On Thu, Apr 6, 2017 at 4:10 PM, Thomas Hellstrom <thellstrom at vmware.com> wrote:
>>> On 04/06/2017 02:34 PM, Daniel Vetter wrote:
>>>> Hi Thomas,
>>>>
>>>> Bisected an offender already? Afaik there's no one else who reported
>>>> issues thus far, and for our own CI it seems all still fine.
>>>> -Daniel
>>> Hi, Daniel,
>>>
>>> Yes, I rebased drm-misc-next on top of vmwgfx-next and found the culprit
>>> to be
>>>
>>> 38b6441e "drm/atomic-helper: Remove the backoff hack from set_config.."
>>>
>>> Reverting first 1fa4da04 and then
>>> 38b6441e
>>>
>>> fixes the problem.
>> Yeah, we seem to have a solid functional conflict between the vmwgfx
>> atomic conversion, and the changes in drm-misc-next. Preliminary
>> analysis, but I think what's going on is:
>> - With the above changes in -misc we punt the deadlock retry loop to
>> the callers of ->set_config.
>> - But since it would have been way too invasive, I only fixed up the
>> atomic callers (in most places we have special paths for atomic and
>> non-atomic due to slightly different semantics), which means for
>> legacy functions we in some cases pass a NULL ctx down to
>> ->set_config. But since legacy paths only get called on legacy
>> drivers, no problem.
>> - Well except I've done that audit before vmwgfx became atomic, and
>> that audit is now wrong, and I've forgotten to properly re-audit when
>> the conflicts happened all around. But since I half-expect to hit a
>> mid-driver conversion with this I did sprinkle
>> WARN_ON(drm_drv_uses_atomic_modeset()) over all these paths.
>>
>> So assuming this is correct, you should see a pile of WARN_ON
>> backtraces that you're hitting in the atomic-vmwgfx+drm-misc-next
>> combo. The proper fix would be to switch over to atomic primitives for
>> all these cases. On a quick look I see some in the vmwgfx fbdev
>> emulation code, might even be worth it to check whether we could reuse
>> the core helpers (which do this split handling alread) in some cases.
>>
>> Cheers, Daniel
>
> So with the two reverts previously mentioned applied, I see the
> following. Is this consistent with the above.
>
> FWIW I did a pretty big vmwgfx fbdev rewrite some time ago, but at that
> time we didn't have the callbacks
> necessary to use the helpers. Maybe that has changed with the atomic
> implementation.
>
> Considering that Sinclair just had a baby, I'm not 100% sure though,
> that I have time to fix this up in the vmwgfx driver for this merge
> window...
>
> /Thomas
>
>
> [ 9.547101] WARNING: CPU: 3 PID: 359 at
> drivers/gpu/drm/drm_modeset_lock.c:107 drm_modeset_lock_all+0xb8/0xc0 [drm]
> [ 9.547102] Modules linked in: snd_rawmidi snd_timer
> ghash_clmulni_intel intel_rapl_perf ppdev snd_seq_device vmw_balloon snd
> rfkill joydev soundcore nfit parport_pc parport acpi_cpufreq tpm_tis
> tpm_tis_core tpm shpchp vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl
> lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi
> scsi_transport_spi mptscsih crc32c_intel e1000 mptbase ata_generic
> serio_raw pata_acpi uas usb_storage
> [ 9.547122] CPU: 3 PID: 359 Comm: plymouthd Tainted: G W
> 4.11.0-rc4+ #2
> [ 9.547122] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
> Desktop Reference Platform, BIOS 6.00 01/24/2017
> [ 9.547123] Call Trace:
> [ 9.547128] dump_stack+0x63/0x86
> [ 9.547130] __warn+0xcb/0xf0
> [ 9.547131] warn_slowpath_null+0x1d/0x20
> [ 9.547137] drm_modeset_lock_all+0xb8/0xc0 [drm]
> [ 9.547143] vmw_framebuffer_dmabuf_dirty+0x4c/0x200 [vmwgfx]
> [ 9.547145] ? __check_object_size+0x100/0x19d
> [ 9.547152] drm_mode_dirtyfb_ioctl+0x178/0x1a0 [drm]
> [ 9.547158] drm_ioctl+0x209/0x4c0 [drm]
> [ 9.547164] ? drm_mode_getfb+0x100/0x100 [drm]
> [ 9.547165] ? __do_fault+0x1e/0x110
> [ 9.547169] vmw_generic_ioctl+0x193/0x2d0 [vmwgfx]
> [ 9.547175] ? drm_getunique+0xa0/0xa0 [drm]
> [ 9.547179] vmw_unlocked_ioctl+0x15/0x20 [vmwgfx]
> [ 9.547180] do_vfs_ioctl+0xa3/0x5f0
> [ 9.547181] SyS_ioctl+0x79/0x90
> [ 9.547182] do_syscall_64+0x67/0x180
> [ 9.547184] entry_SYSCALL64_slow_path+0x25/0x25
> [ 9.547185] RIP: 0033:0x7fd4c93b7787
> [ 9.547186] RSP: 002b:00007fff17d06b88 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [ 9.547187] RAX: ffffffffffffffda RBX: 0000000000000c80 RCX:
> 00007fd4c93b7787
> [ 9.547187] RDX: 00007fff17d06bc0 RSI: 00000000c01864b1 RDI:
> 0000000000000009
> [ 9.547188] RBP: 00007fff17d06bc0 R08: 00007fd4c7554000 R09:
> 00007fd4ca1e9010
> [ 9.547188] R10: 0000558ffe14ca40 R11: 0000000000000246 R12:
> 00000000c01864b1
> [ 9.547188] R13: 0000000000000009 R14: 0000000000000000 R15:
> 0000000000000258
> [ 9.547190] ---[ end trace 46a3554c8816a28b ]---
This is an artifact of the two reverts, I've forgotten to properly
clear config->acquire_ctx again in the intermediate states.
> 4.824456] WARNING: CPU: 2 PID: 359 at drivers/gpu/drm/drm_crtc.c:499
> drm_mode_set_config_internal+0x40/0x50 [drm]
> [ 4.824457] Modules linked in: vmwgfx drm_kms_helper ttm drm mptspi
> scsi_transport_spi mptscsih crc32c_intel e1000(+) mptbase ata_generic
> serio_raw pata_acpi uas usb_storage
> [ 4.824467] CPU: 2 PID: 359 Comm: plymouthd Tainted: G W
> 4.11.0-rc4+ #2
> [ 4.824468] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
> Desktop Reference Platform, BIOS 6.00 01/24/2017
> [ 4.824468] Call Trace:
> [ 4.824474] dump_stack+0x63/0x86
> [ 4.824476] __warn+0xcb/0xf0
> [ 4.824477] warn_slowpath_null+0x1d/0x20
> [ 4.824483] drm_mode_set_config_internal+0x40/0x50 [drm]
> [ 4.824492] vmw_fb_set_par+0x269/0x580 [vmwgfx]
> [ 4.824494] ? selinux_capable+0x20/0x30
> [ 4.824498] ? ttm_mem_global_reserve.constprop.6+0xd6/0x100 [ttm]
> [ 4.824503] vmw_fb_on+0x24/0x60 [vmwgfx]
> [ 4.824506] vmw_master_drop+0x81/0xc0 [vmwgfx]
> [ 4.824511] drm_drop_master+0x21/0x50 [drm]
> [ 4.824516] drm_dropmaster_ioctl+0x6c/0x70 [drm]
> [ 4.824521] drm_ioctl+0x209/0x4c0 [drm]
> [ 4.824526] ? drm_setmaster_ioctl+0xa0/0xa0 [drm]
> [ 4.824528] ? do_filp_open+0xa5/0x100
> [ 4.824532] vmw_generic_ioctl+0x193/0x2d0 [vmwgfx]
> [ 4.824537] ? drm_getunique+0xa0/0xa0 [drm]
> [ 4.824541] vmw_unlocked_ioctl+0x15/0x20 [vmwgfx]
> [ 4.824543] do_vfs_ioctl+0xa3/0x5f0
> [ 4.824544] SyS_ioctl+0x79/0x90
> [ 4.824545] do_syscall_64+0x67/0x180
> [ 4.824547] entry_SYSCALL64_slow_path+0x25/0x25
> [ 4.824548] RIP: 0033:0x7fd4c93b7787
> [ 4.824549] RSP: 002b:00007fff17d06d98 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [ 4.824550] RAX: ffffffffffffffda RBX: 0000558ffe145260 RCX:
> 00007fd4c93b7787
> [ 4.824550] RDX: 0000000000000000 RSI: 000000000000641f RDI:
> 0000000000000009
> [ 4.824551] RBP: 0000000000000000 R08: 00007fd4c967ab98 R09:
> 0000000000000005
> [ 4.824551] R10: 0000558ffe145390 R11: 0000000000000246 R12:
> 000000000000641f
> [ 4.824552] R13: 0000000000000009 R14: 00007fd4c9da78e0 R15:
> 0000000000000000
> [ 4.824553] ---[ end trace 46a3554c8816a28a ]---
Yeah, this is the "don't do that" case that I expected.
> 19.720064] WARNING: CPU: 0 PID: 1316 at
> drivers/gpu/drm/drm_modeset_lock.c:107 drm_modeset_lock_all+0xb8/0xc0 [drm]
> [ 19.720065] Modules linked in: xt_CHECKSUM ipt_MASQUERADE
> nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
> nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat
> ip6table_security ip6table_raw ip6table_mangle ip6table_nat
> nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 iptable_security
> iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
> nf_nat_ipv4 nf_nat nf_conntrack libcrc32c ebtable_filter ebtables
> ip6table_filter ip6_tables vmw_vsock_vmci_transport vsock bnep
> snd_seq_midi snd_seq_midi_event snd_ens1371 gameport snd_ac97_codec
> crct10dif_pclmul ac97_bus btusb btrtl btbcm btintel snd_seq bluetooth
> snd_pcm crc32_pclmul snd_rawmidi snd_timer ghash_clmulni_intel
> intel_rapl_perf ppdev snd_seq_device
> [ 19.720091] vmw_balloon snd rfkill joydev soundcore nfit parport_pc
> parport acpi_cpufreq tpm_tis tpm_tis_core tpm shpchp vmw_vmci i2c_piix4
> nfsd auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm
> drm mptspi scsi_transport_spi mptscsih crc32c_intel e1000 mptbase
> ata_generic serio_raw pata_acpi uas usb_storage
> [ 19.720106] CPU: 0 PID: 1316 Comm: Xorg Tainted: G W
> 4.11.0-rc4+ #2
> [ 19.720107] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
> Desktop Reference Platform, BIOS 6.00 01/24/2017
> [ 19.720107] Call Trace:
> [ 19.720113] dump_stack+0x63/0x86
> [ 19.720115] __warn+0xcb/0xf0
> [ 19.720116] warn_slowpath_null+0x1d/0x20
> [ 19.720123] drm_modeset_lock_all+0xb8/0xc0 [drm]
> [ 19.720129] drm_mode_gamma_set_ioctl+0x3a/0x180 [drm]
> [ 19.720134] drm_ioctl+0x209/0x4c0 [drm]
> [ 19.720140] ? drm_mode_crtc_set_gamma_size+0xa0/0xa0 [drm]
> [ 19.720151] ? add_wait_queue+0x65/0x80
> [ 19.720158] vmw_generic_ioctl+0x193/0x2d0 [vmwgfx]
> [ 19.720163] ? drm_getunique+0xa0/0xa0 [drm]
> [ 19.720167] vmw_unlocked_ioctl+0x15/0x20 [vmwgfx]
> [ 19.720169] do_vfs_ioctl+0xa3/0x5f0
> [ 19.720170] ? sk_prot_alloc+0x5/0x120
> [ 19.720171] SyS_ioctl+0x79/0x90
> [ 19.720173] entry_SYSCALL_64_fastpath+0x1a/0xa9
> [ 19.720174] RIP: 0033:0x7f9eb9f24787
> [ 19.720175] RSP: 002b:00007ffd90012b88 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [ 19.720176] RAX: ffffffffffffffda RBX: 000000000222ffe0 RCX:
> 00007f9eb9f24787
> [ 19.720176] RDX: 00007ffd90012bc0 RSI: 00000000c02064a5 RDI:
> 000000000000000c
> [ 19.720176] RBP: 00007f9eba1e43c0 R08: 0000000002130fb0 R09:
> 00000000021311b0
> [ 19.720177] R10: 0000000000000088 R11: 0000000000000246 R12:
> 0000000000000000
> [ 19.720177] R13: 00007f9ebc6822a8 R14: 00007f9eb9f9b5e0 R15:
> 00007ffd9000eeb0
> [ 19.720179] ---[ end trace 46a3554c8816a293 ]---
> [ 31.611886] systemd-journald[600]: File
> /var/log/journal/fbbc68aec3984fd6b148a9830a1096e0/user-2000.journal
> corrupted or uncleanly shut down, renaming and replacing.
> [ 31.937861] ------------[ cut here ]------------
This is again the leaked acquire_ctx that isn't properly cleared due
to your reverts (well, my not-perfectly-bisectable patches).
I think it should be simple to type up a quick patch to make the
vmwgfx fbdev code work again, I'll submit that asap.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the dri-devel
mailing list