[Intel-gfx] 4.10-rc2 oops in DRM connector code

Daniel Vetter daniel at ffwll.ch
Mon Jan 9 10:15:16 UTC 2017


On Thu, Jan 05, 2017 at 11:03:44AM -0800, Dave Hansen wrote:
> My Thinkpad x260 doesn't like to be unplugged from its dock.  I don't
> think this is a new bug.  It's happening on my distro's 4.4 kernel
> as well.
> 
> The actual oops is in device_del().  It appears to have been passed a
> null 'struct device *'.
> 
> There appears to have been a race _around_ here fixed in 1f7717552e.
> I've looked for and tried to find the locking that prevents
> drm_connector_unregister() from being called twice concurrently.  I'm
> unable to find anything.
> 
> drm_dp_destroy_connector_work() has some locking that looks useful:
> 
> 	mutex_lock(&mgr->destroy_connector_lock)
> 
> but it's released before the offending call:
> 
> 	mgr->cbs->destroy_connector(mgr, port->connector);
> 
> which actually calls intel_dp_destroy_mst_connector().  I have no idea
> if it's correct (and haven't even run it with lockdep), but the attached
> patch does seem to fix my oopses.
> 
> Any ideas?
> 
> > Jan  5 10:22:32 ray kernel: [  537.087042] BUG: unable to handle kernel NULL pointer dereference at 000000000000009e
> > Jan  5 10:22:32 ray kernel: [  537.087954] IP: device_del+0x19/0x330
> > Jan  5 10:22:32 ray kernel: [  537.088860] PGD 0
> > Jan  5 10:22:32 ray kernel: [  537.088860]
> > Jan  5 10:22:32 ray kernel: [  537.090578] Oops: 0000 [#1] SMP
> > Jan  5 10:22:32 ray kernel: [  537.091406] Modules linked in: ctr ccm ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc iptable_filter ip_tables ebtable_nat ebtables x_tables cmac rfcomm bnep dm_crypt arc4 iwlmvm mac80211 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iwlwifi intel_rapl snd_hda_intel iosf_mbi hid_logitech_hidpp snd_seq_midi cfg80211 x86_pkg_temp_thermal snd_hda_codec snd_seq_midi_event snd_hwdep btusb snd_rawmidi snd_hda_core btrtl coretemp snd_seq snd_pcm btbcm btintel joydev bluetooth ghash_clmulni_intel snd_timer shpchp thinkpad_acpi snd_seq_device nvram wmi snd soundcore mac_hid aesni_intel aes_x86_64 crypto_simd cryptd glue_helper kvm_intel
> > Jan  5 10:22:32 ray kernel: [  537.095222]  kvm irqbypass hid_generic hid_logitech_dj usbhid hid
> > Jan  5 10:22:32 ray kernel: [  537.096272] CPU: 2 PID: 23 Comm: kworker/2:0 Tainted: G        W       4.10.0-rc2 #47
> > Jan  5 10:22:32 ray kernel: [  537.097263] Hardware name: LENOVO 20F5S7V800/20F5S7V800, BIOS R02ET50W (1.23 ) 09/20/2016
> > Jan  5 10:22:32 ray kernel: [  537.098291] Workqueue: events drm_dp_destroy_connector_work
> > Jan  5 10:22:32 ray kernel: [  537.099328] task: ffff88040f2f1e00 task.stack: ffffc9000198c000
> > Jan  5 10:22:32 ray kernel: [  537.100335] RIP: 0010:device_del+0x19/0x330
> > Jan  5 10:22:32 ray kernel: [  537.101340] RSP: 0018:ffffc9000198fd58 EFLAGS: 00010282
> > Jan  5 10:22:32 ray kernel: [  537.102361] RAX: 0000000000000000 RBX: fffffffffffffffe RCX: ffff88040c5191b0
> > Jan  5 10:22:32 ray kernel: [  537.103418] RDX: ffffffff81cb6246 RSI: 0000000000000001 RDI: fffffffffffffffe
> > Jan  5 10:22:32 ray kernel: [  537.104473] RBP: ffffc9000198fd90 R08: 0000000000000000 R09: ffff880421517780
> > Jan  5 10:22:32 ray kernel: [  537.105574] R10: 0000007d0ce17c93 R11: 0000000000000001 R12: fffffffffffffffe
> > Jan  5 10:22:32 ray kernel: [  537.106636] R13: ffff88040ed36bd8 R14: ffff88040ed36788 R15: ffff88040c728810
> > Jan  5 10:22:32 ray kernel: [  537.107728] FS:  0000000000000000(0000) GS:ffff880421500000(0000) knlGS:0000000000000000
> > Jan  5 10:22:32 ray kernel: [  537.108812] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Jan  5 10:22:32 ray kernel: [  537.109937] CR2: 000000000000009e CR3: 0000000384894000 CR4: 00000000003406e0
> > Jan  5 10:22:32 ray kernel: [  537.111038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > Jan  5 10:22:32 ray kernel: [  537.112142] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Jan  5 10:22:32 ray kernel: [  537.113223] Call Trace:
> > Jan  5 10:22:32 ray kernel: [  537.114293]  device_unregister+0x12/0x30
> > Jan  5 10:22:32 ray kernel: [  537.115354]  drm_sysfs_connector_remove+0x3b/0x50
> > Jan  5 10:22:32 ray kernel: [  537.116391]  drm_connector_unregister.part.8+0x27/0x40
> > Jan  5 10:22:32 ray kernel: [  537.117433]  drm_connector_unregister+0x14/0x20
> > Jan  5 10:22:32 ray kernel: [  537.118478]  intel_dp_destroy_mst_connector+0x1a/0x80
> > Jan  5 10:22:32 ray kernel: [  537.119513]  drm_dp_destroy_connector_work+0xa9/0x150
> > Jan  5 10:22:32 ray kernel: [  537.120539]  process_one_work+0x14b/0x430
> > Jan  5 10:22:32 ray kernel: [  537.121568]  worker_thread+0x12b/0x4a0
> > Jan  5 10:22:32 ray kernel: [  537.122581]  kthread+0x10c/0x140
> > Jan  5 10:22:32 ray kernel: [  537.123583]  ? process_one_work+0x430/0x430
> > Jan  5 10:22:32 ray kernel: [  537.124584]  ? kthread_create_on_node+0x40/0x40
> > Jan  5 10:22:32 ray kernel: [  537.125574]  ret_from_fork+0x27/0x40
> > Jan  5 10:22:32 ray kernel: [  537.126562] Code: 00 00 00 00 00 00 00 5b 41 5c 41 5d 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 <48> 8b 87 a0 00 00 00 4c 8b 2f 48 85 c0 74 1b 48 8b b8 90 00 00
> > Jan  5 10:22:32 ray kernel: [  537.127644] RIP: device_del+0x19/0x330 RSP: ffffc9000198fd58
> > Jan  5 10:22:32 ray kernel: [  537.128690] CR2: 000000000000009e
> > Jan  5 10:22:32 ray kernel: [  537.129759] ---[ end trace 7e17c77627e8f513 ]---

> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index aa64448..85beebc 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -2914,13 +2914,14 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
>  			break;
>  		}
>  		list_del(&port->next);
> -		mutex_unlock(&mgr->destroy_connector_lock);
>  
>  		kref_init(&port->kref);
>  		INIT_LIST_HEAD(&port->next);
>  
>  		mgr->cbs->destroy_connector(mgr, port->connector);
>  
> +		mutex_unlock(&mgr->destroy_connector_lock);

The lock here is just for port->next, and that should ensure that you're
double-releasing the same connector. We do still have lifetime issues with
connectors in 4.10 (getting fixed in 4.11, it's a bit a mess), but I don't
think those could be blamed for this oops.

The other funky thing is that this is from a worker, and it's the only
place that ever calls ->destroy_connector. It /should/ already be
single-threaded afaik connector destruction goes.

Can you pls do some printk tracing to make sure that without your patch
we're indeed releasing the same connector twice from this loop? I suspect
you're just ever-so-slightly shifting the timing and things blow up
somewhre else. But no idea where :(
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dri-devel mailing list