[PATCH] drm: Fix oops + Xserver hang when unplugging USB drm devices
Chris Wilson
chris at chris-wilson.co.uk
Thu Jun 1 12:14:56 UTC 2017
On Thu, Jun 01, 2017 at 01:54:30PM +0200, Hans de Goede wrote:
> commit a39be606f99d ("drm: Do a full device unregister when unplugging")
> causes backtraces like this one when unplugging an usb drm device while
> it is in use:
>
> usb 2-3: USB disconnect, device number 25
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:424
> drm_mode_config_cleanup+0x220/0x280 [drm]
> ...
> RIP: 0010:drm_mode_config_cleanup+0x220/0x280 [drm]
> ...
> Call Trace:
> gm12u320_modeset_cleanup+0xe/0x10 [gm12u320]
> gm12u320_driver_unload+0x35/0x70 [gm12u320]
> drm_dev_unregister+0x3c/0xe0 [drm]
> drm_unplug_dev+0x12/0x60 [drm]
> gm12u320_usb_disconnect+0x36/0x40 [gm12u320]
> usb_unbind_interface+0x72/0x280
> device_release_driver_internal+0x158/0x210
> device_release_driver+0x12/0x20
> bus_remove_device+0x104/0x180
> device_del+0x1d2/0x350
> usb_disable_device+0x9f/0x270
> usb_disconnect+0xc6/0x260
> ...
> [drm:drm_mode_config_cleanup [drm]] *ERROR* connector Unknown-1 leaked!
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:458
> drm_mode_config_cleanup+0x268/0x280 [drm]
> ...
> <same Call Trace>
> ---[ end trace 80df975dae439ed6 ]---
> general protection fault: 0000 [#1] SMP
> ...
> Call Trace:
> ? __switch_to+0x225/0x450
> drm_mode_rmfb_work_fn+0x55/0x70 [drm]
> process_one_work+0x193/0x3c0
> worker_thread+0x4a/0x3a0
> ...
> RIP: drm_framebuffer_remove+0x62/0x3f0 [drm] RSP: ffffb776c39dfd98
> ---[ end trace 80df975dae439ed7 ]---
>
> After which the system is unusable this is caused by drm_dev_unregister
> getting called immediately on unplug, which calls the drivers unload
> function which calls drm_mode_config_cleanup which removes the framebuffer
> object while userspace is still holding a reference to it.
>
> Reverting commit a39be606f99d ("drm: Do a full device unregister
> when unplugging") leads to the following oops on unplug instead,
> when userspace closes the last fd referencing the drm_dev:
>
> sysfs group 'power' not found for kobject 'card1-Unknown-1'
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 2459 at fs/sysfs/group.c:237
> sysfs_remove_group+0x80/0x90
> ...
> RIP: 0010:sysfs_remove_group+0x80/0x90
> ...
> Call Trace:
> dpm_sysfs_remove+0x57/0x60
> device_del+0xfd/0x350
> device_unregister+0x1a/0x60
> drm_sysfs_connector_remove+0x39/0x50 [drm]
> drm_connector_unregister+0x5a/0x70 [drm]
> drm_connector_unregister_all+0x45/0xa0 [drm]
> drm_modeset_unregister_all+0x12/0x30 [drm]
> drm_dev_unregister+0xca/0xe0 [drm]
> drm_put_dev+0x32/0x60 [drm]
> drm_release+0x2f3/0x380 [drm]
> __fput+0xdf/0x1e0
> ...
> ---[ end trace ecfb91ac85688bbe ]---
> BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
> IP: down_write+0x1f/0x40
> ...
> Call Trace:
> debugfs_remove_recursive+0x55/0x1b0
> drm_debugfs_connector_remove+0x21/0x40 [drm]
> drm_connector_unregister+0x62/0x70 [drm]
> drm_connector_unregister_all+0x45/0xa0 [drm]
> drm_modeset_unregister_all+0x12/0x30 [drm]
> drm_dev_unregister+0xca/0xe0 [drm]
> drm_put_dev+0x32/0x60 [drm]
> drm_release+0x2f3/0x380 [drm]
> __fput+0xdf/0x1e0
> ...
> ---[ end trace ecfb91ac85688bbf ]---
>
> This is caused by the revert moving back to drm_unplug_dev calling
> drm_minor_unregister which does:
>
> device_del(minor->kdev);
> dev_set_drvdata(minor->kdev, NULL); /* safety belt */
> drm_debugfs_cleanup(minor);
>
> Causing the sysfs entries to already be removed even though we still
> have references to them in e.g. drm_connector.
>
> Note we must call drm_minor_unregister to notify userspace of the unplug
> of the device, so calling drm_dev_unregister is not completely wrong the
> problem is that drm_dev_unregister does too much.
>
> This commit fixes drm_unplug_dev by not only reverting
> commit a39be606f99d ("drm: Do a full device unregister when unplugging")
> but by also adding a call to drm_modeset_unregister_all before the
> drm_minor_unregister calls to make sure all sysfs entries are removed
> before calling device_del(minor->kdev) thereby also fixing the second
> set of oopses caused by just reverting the commit.
That really does sound like a step in the wrong direction though. But
that seems to because of the call to driver->unload() from the middle of
unregister that is catching me by surprise.
Regression fix trumps niceties, so
Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
Looks like about 50% of the remaining driver->unload callbacks can be
replaced by a driver->release callback. The rest need care to finish
demidlayering driver->load/unload. And I still think unplug needs some
polish.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the dri-devel
mailing list