[PATCH] drm: Fix oops + Xserver hang when unplugging USB drm devices

Sean Paul seanpaul at chromium.org
Fri Jun 2 17:10:56 UTC 2017


On Thu, Jun 01, 2017 at 01:14:56PM +0100, Chris Wilson wrote:
> On Thu, Jun 01, 2017 at 01:54:30PM +0200, Hans de Goede wrote:
> > commit a39be606f99d ("drm: Do a full device unregister when unplugging")
> > causes backtraces like this one when unplugging an usb drm device while
> > it is in use:
> > 
> > usb 2-3: USB disconnect, device number 25
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:424
> >    drm_mode_config_cleanup+0x220/0x280 [drm]
> > ...
> > RIP: 0010:drm_mode_config_cleanup+0x220/0x280 [drm]
> > ...
> > Call Trace:
> >  gm12u320_modeset_cleanup+0xe/0x10 [gm12u320]
> >  gm12u320_driver_unload+0x35/0x70 [gm12u320]
> >  drm_dev_unregister+0x3c/0xe0 [drm]
> >  drm_unplug_dev+0x12/0x60 [drm]
> >  gm12u320_usb_disconnect+0x36/0x40 [gm12u320]
> >  usb_unbind_interface+0x72/0x280
> >  device_release_driver_internal+0x158/0x210
> >  device_release_driver+0x12/0x20
> >  bus_remove_device+0x104/0x180
> >  device_del+0x1d2/0x350
> >  usb_disable_device+0x9f/0x270
> >  usb_disconnect+0xc6/0x260
> > ...
> > [drm:drm_mode_config_cleanup [drm]] *ERROR* connector Unknown-1 leaked!
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:458
> >    drm_mode_config_cleanup+0x268/0x280 [drm]
> > ...
> > <same Call Trace>
> > ---[ end trace 80df975dae439ed6 ]---
> > general protection fault: 0000 [#1] SMP
> > ...
> > Call Trace:
> >  ? __switch_to+0x225/0x450
> >  drm_mode_rmfb_work_fn+0x55/0x70 [drm]
> >  process_one_work+0x193/0x3c0
> >  worker_thread+0x4a/0x3a0
> > ...
> > RIP: drm_framebuffer_remove+0x62/0x3f0 [drm] RSP: ffffb776c39dfd98
> > ---[ end trace 80df975dae439ed7 ]---
> > 
> > After which the system is unusable this is caused by drm_dev_unregister
> > getting called immediately on unplug, which calls the drivers unload
> > function which calls drm_mode_config_cleanup which removes the framebuffer
> > object while userspace is still holding a reference to it.
> > 
> > Reverting commit a39be606f99d ("drm: Do a full device unregister
> > when unplugging") leads to the following oops on unplug instead,
> > when userspace closes the last fd referencing the drm_dev:
> > 
> > sysfs group 'power' not found for kobject 'card1-Unknown-1'
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 2459 at fs/sysfs/group.c:237
> >    sysfs_remove_group+0x80/0x90
> > ...
> > RIP: 0010:sysfs_remove_group+0x80/0x90
> > ...
> > Call Trace:
> >  dpm_sysfs_remove+0x57/0x60
> >  device_del+0xfd/0x350
> >  device_unregister+0x1a/0x60
> >  drm_sysfs_connector_remove+0x39/0x50 [drm]
> >  drm_connector_unregister+0x5a/0x70 [drm]
> >  drm_connector_unregister_all+0x45/0xa0 [drm]
> >  drm_modeset_unregister_all+0x12/0x30 [drm]
> >  drm_dev_unregister+0xca/0xe0 [drm]
> >  drm_put_dev+0x32/0x60 [drm]
> >  drm_release+0x2f3/0x380 [drm]
> >  __fput+0xdf/0x1e0
> > ...
> > ---[ end trace ecfb91ac85688bbe ]---
> > BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
> > IP: down_write+0x1f/0x40
> > ...
> > Call Trace:
> >  debugfs_remove_recursive+0x55/0x1b0
> >  drm_debugfs_connector_remove+0x21/0x40 [drm]
> >  drm_connector_unregister+0x62/0x70 [drm]
> >  drm_connector_unregister_all+0x45/0xa0 [drm]
> >  drm_modeset_unregister_all+0x12/0x30 [drm]
> >  drm_dev_unregister+0xca/0xe0 [drm]
> >  drm_put_dev+0x32/0x60 [drm]
> >  drm_release+0x2f3/0x380 [drm]
> >  __fput+0xdf/0x1e0
> > ...
> > ---[ end trace ecfb91ac85688bbf ]---
> > 
> > This is caused by the revert moving back to drm_unplug_dev calling
> > drm_minor_unregister which does:
> > 
> >         device_del(minor->kdev);
> >         dev_set_drvdata(minor->kdev, NULL); /* safety belt */
> >         drm_debugfs_cleanup(minor);
> > 
> > Causing the sysfs entries to already be removed even though we still
> > have references to them in e.g. drm_connector.
> > 
> > Note we must call drm_minor_unregister to notify userspace of the unplug
> > of the device, so calling drm_dev_unregister is not completely wrong the
> > problem is that drm_dev_unregister does too much.
> > 
> > This commit fixes drm_unplug_dev by not only reverting
> > commit a39be606f99d ("drm: Do a full device unregister when unplugging")
> > but by also adding a call to drm_modeset_unregister_all before the
> > drm_minor_unregister calls to make sure all sysfs entries are removed
> > before calling device_del(minor->kdev) thereby also fixing the second
> > set of oopses caused by just reverting the commit.
> 
> That really does sound like a step in the wrong direction though. But
> that seems to because of the call to driver->unload() from the middle of
> unregister that is catching me by surprise.
> 
> Regression fix trumps niceties, so
> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>

Indeed it does seem like we'll need to sort this out, but agreed let's
put out the fire before renovating the house.

Applied to -misc-fixes

Sean


> 
> Looks like about 50% of the remaining driver->unload callbacks can be
> replaced by a driver->release callback. The rest need care to finish
> demidlayering driver->load/unload. And I still think unplug needs some
> polish.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

-- 
Sean Paul, Software Engineer, Google / Chromium OS


More information about the dri-devel mailing list