[PATCH] Revert "drm/dp_mst: Skip validating ports during destruction, just ref"

Sean Paul sean at poorly.run
Wed Nov 28 21:04:15 UTC 2018


On Wed, Nov 28, 2018 at 04:00:05PM -0500, Lyude Paul wrote:
> This reverts commit:
> 
> c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just ref")
> 
> ugh.
> 
> In drm_dp_destroy_connector_work(), we have a pretty good chance of
> freeing the actual struct drm_dp_mst_port. However, after destroying
> things we send a hotplug through (*mgr->cbs->hotplug)(mgr) which is
> where the problems start.
> 
> For i915, this calls all the way down to the fbcon probing helpers,
> which start trying to access the port in a modeset.
> 
> [   45.062001] ==================================================================
> [   45.062112] BUG: KASAN: use-after-free in ex_handler_refcount+0x146/0x180
> [   45.062196] Write of size 4 at addr ffff8882b4b70968 by task kworker/3:1/53
> 
> [   45.062325] CPU: 3 PID: 53 Comm: kworker/3:1 Kdump: loaded Tainted: G           O      4.20.0-rc4Lyude-Test+ #3
> [   45.062442] Hardware name: LENOVO 20BWS1KY00/20BWS1KY00, BIOS JBET71WW (1.35 ) 09/14/2018
> [   45.062554] Workqueue: events drm_dp_destroy_connector_work [drm_kms_helper]
> [   45.062641] Call Trace:
> [   45.062685]  dump_stack+0xbd/0x15a
> [   45.062735]  ? dump_stack_print_info.cold.0+0x1b/0x1b
> [   45.062801]  ? printk+0x9f/0xc5
> [   45.062847]  ? kmsg_dump_rewind_nolock+0xe4/0xe4
> [   45.062909]  ? ex_handler_refcount+0x146/0x180
> [   45.062970]  print_address_description+0x71/0x239
> [   45.063036]  ? ex_handler_refcount+0x146/0x180
> [   45.063095]  kasan_report.cold.5+0x242/0x30b
> [   45.063155]  __asan_report_store4_noabort+0x1c/0x20
> [   45.063313]  ex_handler_refcount+0x146/0x180
> [   45.063371]  ? ex_handler_clear_fs+0xb0/0xb0
> [   45.063428]  fixup_exception+0x98/0xd7
> [   45.063484]  ? raw_notifier_call_chain+0x20/0x20
> [   45.063548]  do_trap+0x6d/0x210
> [   45.063605]  ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
> [   45.063732]  do_error_trap+0xc0/0x170
> [   45.063802]  ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
> [   45.063929]  do_invalid_op+0x3b/0x50
> [   45.063997]  ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
> [   45.064103]  invalid_op+0x14/0x20
> [   45.064162] RIP: 0010:_GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
> [   45.064274] Code: 00 48 c7 c7 80 fe 53 a0 48 89 e5 e8 5b 6f 26 e1 5d c3 48 8d 0e 0f 0b 48 8d 0b 0f 0b 48 8d 0f 0f 0b 48 8d 0f 0f 0b 49 8d 4d 00 <0f> 0b 49 8d 0e 0f 0b 48 8d 08 0f 0b 49 8d 4d 00 0f 0b 48 8d 0b 0f
> [   45.064569] RSP: 0018:ffff8882b789ee10 EFLAGS: 00010282
> [   45.064637] RAX: ffff8882af47ae70 RBX: ffff8882af47aa60 RCX: ffff8882b4b70968
> [   45.064723] RDX: ffff8882af47ae70 RSI: 0000000000000008 RDI: ffff8882b788bdb8
> [   45.064808] RBP: ffff8882b789ee28 R08: ffffed1056f13db4 R09: ffffed1056f13db3
> [   45.064894] R10: ffffed1056f13db3 R11: ffff8882b789ed9f R12: ffff8882af47ad28
> [   45.064980] R13: ffff8882b4b70968 R14: ffff8882acd86728 R15: ffff8882b4b75dc8
> [   45.065084]  drm_dp_mst_reset_vcpi_slots+0x12/0x80 [drm_kms_helper]
> [   45.065225]  intel_mst_disable_dp+0xda/0x180 [i915]
> [   45.065361]  intel_encoders_disable.isra.107+0x197/0x310 [i915]
> [   45.065498]  haswell_crtc_disable+0xbe/0x400 [i915]
> [   45.065622]  ? i9xx_disable_plane+0x1c0/0x3e0 [i915]
> [   45.065750]  intel_atomic_commit_tail+0x74e/0x3e60 [i915]
> [   45.065884]  ? intel_pre_plane_update+0xbc0/0xbc0 [i915]
> [   45.065968]  ? drm_atomic_helper_swap_state+0x88b/0x1d90 [drm_kms_helper]
> [   45.066054]  ? kasan_check_write+0x14/0x20
> [   45.066165]  ? i915_gem_track_fb+0x13a/0x330 [i915]
> [   45.066277]  ? i915_sw_fence_complete+0xe9/0x140 [i915]
> [   45.066406]  ? __i915_sw_fence_complete+0xc50/0xc50 [i915]
> [   45.066540]  intel_atomic_commit+0x72e/0xef0 [i915]
> [   45.066635]  ? drm_dev_dbg+0x200/0x200 [drm]
> [   45.066764]  ? intel_atomic_commit_tail+0x3e60/0x3e60 [i915]
> [   45.066898]  ? intel_atomic_commit_tail+0x3e60/0x3e60 [i915]
> [   45.067001]  drm_atomic_commit+0xc4/0xf0 [drm]
> [   45.067074]  restore_fbdev_mode_atomic+0x562/0x780 [drm_kms_helper]
> [   45.067166]  ? drm_fb_helper_debug_leave+0x690/0x690 [drm_kms_helper]
> [   45.067249]  ? kasan_check_read+0x11/0x20
> [   45.067324]  restore_fbdev_mode+0x127/0x4b0 [drm_kms_helper]
> [   45.067364]  ? kasan_check_read+0x11/0x20
> [   45.067406]  drm_fb_helper_restore_fbdev_mode_unlocked+0x164/0x200 [drm_kms_helper]
> [   45.067462]  ? drm_fb_helper_hotplug_event+0x30/0x30 [drm_kms_helper]
> [   45.067508]  ? kasan_check_write+0x14/0x20
> [   45.070360]  ? mutex_unlock+0x22/0x40
> [   45.073748]  drm_fb_helper_set_par+0xb2/0xf0 [drm_kms_helper]
> [   45.075846]  drm_fb_helper_hotplug_event.part.33+0x1cd/0x290 [drm_kms_helper]
> [   45.078088]  drm_fb_helper_hotplug_event+0x1c/0x30 [drm_kms_helper]
> [   45.082614]  intel_fbdev_output_poll_changed+0x9f/0x140 [i915]
> [   45.087069]  drm_kms_helper_hotplug_event+0x67/0x90 [drm_kms_helper]
> [   45.089319]  intel_dp_mst_hotplug+0x37/0x50 [i915]
> [   45.091496]  drm_dp_destroy_connector_work+0x510/0x6f0 [drm_kms_helper]
> [   45.093675]  ? drm_dp_update_payload_part1+0x1220/0x1220 [drm_kms_helper]
> [   45.095851]  ? kasan_check_write+0x14/0x20
> [   45.098473]  ? kasan_check_read+0x11/0x20
> [   45.101155]  ? strscpy+0x17c/0x530
> [   45.103808]  ? __switch_to_asm+0x34/0x70
> [   45.106456]  ? syscall_return_via_sysret+0xf/0x7f
> [   45.109711]  ? read_word_at_a_time+0x20/0x20
> [   45.113138]  ? __switch_to_asm+0x40/0x70
> [   45.116529]  ? __switch_to_asm+0x34/0x70
> [   45.119891]  ? __switch_to_asm+0x40/0x70
> [   45.123224]  ? __switch_to_asm+0x34/0x70
> [   45.126540]  ? __switch_to_asm+0x34/0x70
> [   45.129824]  process_one_work+0x88d/0x15d0
> [   45.133172]  ? pool_mayday_timeout+0x850/0x850
> [   45.136459]  ? pci_mmcfg_check_reserved+0x110/0x128
> [   45.139739]  ? wake_q_add+0xb0/0xb0
> [   45.143010]  ? check_preempt_wakeup+0x652/0x1050
> [   45.146304]  ? worker_enter_idle+0x29e/0x740
> [   45.149589]  ? __schedule+0x1ec0/0x1ec0
> [   45.152937]  ? kasan_check_read+0x11/0x20
> [   45.156179]  ? _raw_spin_lock_irq+0xa3/0x130
> [   45.159382]  ? _raw_read_unlock_irqrestore+0x30/0x30
> [   45.162542]  ? kasan_check_write+0x14/0x20
> [   45.165657]  worker_thread+0x1a5/0x1470
> [   45.168725]  ? set_load_weight+0x2e0/0x2e0
> [   45.171755]  ? process_one_work+0x15d0/0x15d0
> [   45.174806]  ? __switch_to_asm+0x34/0x70
> [   45.177645]  ? __switch_to_asm+0x40/0x70
> [   45.180323]  ? __switch_to_asm+0x34/0x70
> [   45.182936]  ? __switch_to_asm+0x40/0x70
> [   45.185539]  ? __switch_to_asm+0x34/0x70
> [   45.188100]  ? __switch_to_asm+0x40/0x70
> [   45.190628]  ? __schedule+0x7d4/0x1ec0
> [   45.193143]  ? save_stack+0xa9/0xd0
> [   45.195632]  ? kasan_check_write+0x10/0x20
> [   45.198162]  ? kasan_kmalloc+0xc4/0xe0
> [   45.200609]  ? kmem_cache_alloc_trace+0xdd/0x190
> [   45.203046]  ? kthread+0x9f/0x3b0
> [   45.205470]  ? ret_from_fork+0x35/0x40
> [   45.207876]  ? unwind_next_frame+0x43/0x50
> [   45.210273]  ? __save_stack_trace+0x82/0x100
> [   45.212658]  ? deactivate_slab.isra.67+0x3d4/0x580
> [   45.215026]  ? default_wake_function+0x35/0x50
> [   45.217399]  ? kasan_check_read+0x11/0x20
> [   45.219825]  ? _raw_spin_lock_irqsave+0xae/0x140
> [   45.222174]  ? __lock_text_start+0x8/0x8
> [   45.224521]  ? replenish_dl_entity.cold.62+0x4f/0x4f
> [   45.226868]  ? __kthread_parkme+0x87/0xf0
> [   45.229200]  kthread+0x2f7/0x3b0
> [   45.231557]  ? process_one_work+0x15d0/0x15d0
> [   45.233923]  ? kthread_park+0x120/0x120
> [   45.236249]  ret_from_fork+0x35/0x40
> 
> [   45.240875] Allocated by task 242:
> [   45.243136]  save_stack+0x43/0xd0
> [   45.245385]  kasan_kmalloc+0xc4/0xe0
> [   45.247597]  kmem_cache_alloc_trace+0xdd/0x190
> [   45.249793]  drm_dp_add_port+0x1e0/0x2170 [drm_kms_helper]
> [   45.252000]  drm_dp_send_link_address+0x4a7/0x740 [drm_kms_helper]
> [   45.254389]  drm_dp_check_and_send_link_address+0x1a7/0x210 [drm_kms_helper]
> [   45.256803]  drm_dp_mst_link_probe_work+0x6f/0xb0 [drm_kms_helper]
> [   45.259200]  process_one_work+0x88d/0x15d0
> [   45.261597]  worker_thread+0x1a5/0x1470
> [   45.264038]  kthread+0x2f7/0x3b0
> [   45.266371]  ret_from_fork+0x35/0x40
> 
> [   45.270937] Freed by task 53:
> [   45.273170]  save_stack+0x43/0xd0
> [   45.275382]  __kasan_slab_free+0x139/0x190
> [   45.277604]  kasan_slab_free+0xe/0x10
> [   45.279826]  kfree+0x99/0x1b0
> [   45.282044]  drm_dp_free_mst_port+0x4a/0x60 [drm_kms_helper]
> [   45.284330]  drm_dp_destroy_connector_work+0x43e/0x6f0 [drm_kms_helper]
> [   45.286660]  process_one_work+0x88d/0x15d0
> [   45.288934]  worker_thread+0x1a5/0x1470
> [   45.291231]  kthread+0x2f7/0x3b0
> [   45.293547]  ret_from_fork+0x35/0x40
> 
> [   45.298206] The buggy address belongs to the object at ffff8882b4b70968
>                 which belongs to the cache kmalloc-2k of size 2048
> [   45.303047] The buggy address is located 0 bytes inside of
>                 2048-byte region [ffff8882b4b70968, ffff8882b4b71168)
> [   45.308010] The buggy address belongs to the page:
> [   45.310477] page:ffffea000ad2dc00 count:1 mapcount:0 mapping:ffff8882c080cf40 index:0x0 compound_mapcount: 0
> [   45.313051] flags: 0x8000000000010200(slab|head)
> [   45.315635] raw: 8000000000010200 ffffea000aac2808 ffffea000abe8608 ffff8882c080cf40
> [   45.318300] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
> [   45.320966] page dumped because: kasan: bad access detected
> 
> [   45.326312] Memory state around the buggy address:
> [   45.329085]  ffff8882b4b70800: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [   45.331845]  ffff8882b4b70880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [   45.334584] >ffff8882b4b70900: fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb
> [   45.337302]                                                           ^
> [   45.340061]  ffff8882b4b70980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   45.342910]  ffff8882b4b70a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   45.345748] ==================================================================
> 
> So, this definitely isn't a fix that we want. This being said; there's
> no real easy fix for this problem because of some of the catch-22's of
> the MST helpers current design. For starters; we always need to validate
> a port with drm_dp_get_validated_port_ref(), but validation relies on
> the lifetime of the port in the actual topology. So once the port is
> gone, it can't be validated again.
> 
> If we were to try to make the payload helpers not use port validation,
> then we'd cause another problem: if the port isn't validated, it could
> be freed and we'd just start causing more KASAN issues. There are
> already hacks that attempt to workaround this in
> drm_dp_mst_destroy_connector_work() by re-initializing the kref so that
> it can be used again and it's memory can be freed once the VCPI helpers
> finish removing the port's respective payloads. But none of these really
> do anything helpful since the port still can't be validated since it's
> gone from the topology. Also, that workaround is immensely confusing to
> read through.
> 
> What really needs to be done in order to fix this is to teach DRM how to
> track the lifetime of the structs for MST ports and branch devices
> seperately from their lifetime in the actual topology. Simply put; this
> means having two different krefs-one that removes the port/branch device
> from the topology, and one that finally calls kfree(). This would let us
> simplify things, since we'd now be able to keep ports around without
> having to keep them in the topology at the same time, which is exactly
> what we need in order to teach our VCPI helpers to only validate ports
> when it's actually necessary without running the risk of trying to use
> unallocated memory.
> 
> Such a fix is on it's way, but for now let's play it safe and just
> revert this. If this bug has been around for well over a year, we can
> wait a little while to get an actual proper fix here.
> 
> Signed-off-by: Lyude Paul <lyude at redhat.com>
> Fixes: c54c7374ff44 ("drm/dp_mst: Skip validating ports during destruction, just ref")
> Cc: Daniel Vetter <daniel at ffwll.ch>
> Cc: Sean Paul <sean at poorly.run>
> Cc: Jerry Zuo <Jerry.Zuo at amd.com>
> Cc: Harry Wentland <Harry.Wentland at amd.com>
> Cc: stable at vger.kernel.org # v4.6+

Acked-by: Sean Paul <sean at poorly.run>

> ---
>  drivers/gpu/drm/drm_dp_mst_topology.c | 15 ++-------------
>  1 file changed, 2 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
> index 08978ad72f33..529414556962 100644
> --- a/drivers/gpu/drm/drm_dp_mst_topology.c
> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
> @@ -1023,20 +1023,9 @@ static struct drm_dp_mst_port *drm_dp_mst_get_port_ref_locked(struct drm_dp_mst_
>  static struct drm_dp_mst_port *drm_dp_get_validated_port_ref(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_port *port)
>  {
>  	struct drm_dp_mst_port *rport = NULL;
> -
>  	mutex_lock(&mgr->lock);
> -	/*
> -	 * Port may or may not be 'valid' but we don't care about that when
> -	 * destroying the port and we are guaranteed that the port pointer
> -	 * will be valid until we've finished
> -	 */
> -	if (current_work() == &mgr->destroy_connector_work) {
> -		kref_get(&port->kref);
> -		rport = port;
> -	} else if (mgr->mst_primary) {
> -		rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary,
> -						       port);
> -	}
> +	if (mgr->mst_primary)
> +		rport = drm_dp_mst_get_port_ref_locked(mgr->mst_primary, port);
>  	mutex_unlock(&mgr->lock);
>  	return rport;
>  }
> -- 
> 2.19.2
> 

-- 
Sean Paul, Software Engineer, Google / Chromium OS


More information about the dri-devel mailing list