[PATCH] drm: Fix oops + Xserver hang when unplugging USB drm devices

Hans de Goede hdegoede at redhat.com
Thu Jun 1 12:30:15 UTC 2017


Hi,

On 01-06-17 14:14, Chris Wilson wrote:
> On Thu, Jun 01, 2017 at 01:54:30PM +0200, Hans de Goede wrote:
>> commit a39be606f99d ("drm: Do a full device unregister when unplugging")
>> causes backtraces like this one when unplugging an usb drm device while
>> it is in use:
>>
>> usb 2-3: USB disconnect, device number 25
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:424
>>     drm_mode_config_cleanup+0x220/0x280 [drm]
>> ...
>> RIP: 0010:drm_mode_config_cleanup+0x220/0x280 [drm]
>> ...
>> Call Trace:
>>   gm12u320_modeset_cleanup+0xe/0x10 [gm12u320]
>>   gm12u320_driver_unload+0x35/0x70 [gm12u320]
>>   drm_dev_unregister+0x3c/0xe0 [drm]
>>   drm_unplug_dev+0x12/0x60 [drm]
>>   gm12u320_usb_disconnect+0x36/0x40 [gm12u320]
>>   usb_unbind_interface+0x72/0x280
>>   device_release_driver_internal+0x158/0x210
>>   device_release_driver+0x12/0x20
>>   bus_remove_device+0x104/0x180
>>   device_del+0x1d2/0x350
>>   usb_disable_device+0x9f/0x270
>>   usb_disconnect+0xc6/0x260
>> ...
>> [drm:drm_mode_config_cleanup [drm]] *ERROR* connector Unknown-1 leaked!
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:458
>>     drm_mode_config_cleanup+0x268/0x280 [drm]
>> ...
>> <same Call Trace>
>> ---[ end trace 80df975dae439ed6 ]---
>> general protection fault: 0000 [#1] SMP
>> ...
>> Call Trace:
>>   ? __switch_to+0x225/0x450
>>   drm_mode_rmfb_work_fn+0x55/0x70 [drm]
>>   process_one_work+0x193/0x3c0
>>   worker_thread+0x4a/0x3a0
>> ...
>> RIP: drm_framebuffer_remove+0x62/0x3f0 [drm] RSP: ffffb776c39dfd98
>> ---[ end trace 80df975dae439ed7 ]---
>>
>> After which the system is unusable this is caused by drm_dev_unregister
>> getting called immediately on unplug, which calls the drivers unload
>> function which calls drm_mode_config_cleanup which removes the framebuffer
>> object while userspace is still holding a reference to it.
>>
>> Reverting commit a39be606f99d ("drm: Do a full device unregister
>> when unplugging") leads to the following oops on unplug instead,
>> when userspace closes the last fd referencing the drm_dev:
>>
>> sysfs group 'power' not found for kobject 'card1-Unknown-1'
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 2459 at fs/sysfs/group.c:237
>>     sysfs_remove_group+0x80/0x90
>> ...
>> RIP: 0010:sysfs_remove_group+0x80/0x90
>> ...
>> Call Trace:
>>   dpm_sysfs_remove+0x57/0x60
>>   device_del+0xfd/0x350
>>   device_unregister+0x1a/0x60
>>   drm_sysfs_connector_remove+0x39/0x50 [drm]
>>   drm_connector_unregister+0x5a/0x70 [drm]
>>   drm_connector_unregister_all+0x45/0xa0 [drm]
>>   drm_modeset_unregister_all+0x12/0x30 [drm]
>>   drm_dev_unregister+0xca/0xe0 [drm]
>>   drm_put_dev+0x32/0x60 [drm]
>>   drm_release+0x2f3/0x380 [drm]
>>   __fput+0xdf/0x1e0
>> ...
>> ---[ end trace ecfb91ac85688bbe ]---
>> BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
>> IP: down_write+0x1f/0x40
>> ...
>> Call Trace:
>>   debugfs_remove_recursive+0x55/0x1b0
>>   drm_debugfs_connector_remove+0x21/0x40 [drm]
>>   drm_connector_unregister+0x62/0x70 [drm]
>>   drm_connector_unregister_all+0x45/0xa0 [drm]
>>   drm_modeset_unregister_all+0x12/0x30 [drm]
>>   drm_dev_unregister+0xca/0xe0 [drm]
>>   drm_put_dev+0x32/0x60 [drm]
>>   drm_release+0x2f3/0x380 [drm]
>>   __fput+0xdf/0x1e0
>> ...
>> ---[ end trace ecfb91ac85688bbf ]---
>>
>> This is caused by the revert moving back to drm_unplug_dev calling
>> drm_minor_unregister which does:
>>
>>          device_del(minor->kdev);
>>          dev_set_drvdata(minor->kdev, NULL); /* safety belt */
>>          drm_debugfs_cleanup(minor);
>>
>> Causing the sysfs entries to already be removed even though we still
>> have references to them in e.g. drm_connector.
>>
>> Note we must call drm_minor_unregister to notify userspace of the unplug
>> of the device, so calling drm_dev_unregister is not completely wrong the
>> problem is that drm_dev_unregister does too much.
>>
>> This commit fixes drm_unplug_dev by not only reverting
>> commit a39be606f99d ("drm: Do a full device unregister when unplugging")
>> but by also adding a call to drm_modeset_unregister_all before the
>> drm_minor_unregister calls to make sure all sysfs entries are removed
>> before calling device_del(minor->kdev) thereby also fixing the second
>> set of oopses caused by just reverting the commit.
> 
> That really does sound like a step in the wrong direction though. But
> that seems to because of the call to driver->unload() from the middle of
> unregister that is catching me by surprise. >
> Regression fix trumps niceties, so

Right, I'm not entirely happy with this fix too, but I decided to go for
the simplest fix possible to get the regression fixed.

As you've seen in my reply in the "[PATCH v11] drm: Unplug drm device when
unregistering it (v8)" thread I have been thinking about nicer ways
to fix this too. Lets continue the discussion about doing a better fix
for a future kernel release there.

> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>

Thank you.

> Looks like about 50% of the remaining driver->unload callbacks can be
> replaced by a driver->release callback. The rest need care to finish
> demidlayering driver->load/unload. And I still think unplug needs some
> polish.

Regards,

Hans


More information about the dri-devel mailing list