devm actions and hw clenaup (was Re: [PATCH 01/11] drm: Add devm_drm_dev_init/register)

Noralf Trønnes noralf at tronnes.org
Tue Jan 29 17:26:49 UTC 2019



Den 29.01.2019 17.50, skrev Daniel Vetter:
> On Tue, Jan 29, 2019 at 03:34:46PM +0100, Noralf Trønnes wrote:
>>
>>
>> Den 24.01.2019 18.57, skrev Daniel Vetter:
>>> On Thu, Jan 24, 2019 at 6:46 PM Greg KH <gregkh at linuxfoundation.org> wrote:
>>>>
>>>> On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote:
>>>>> On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
>>>>>>
>>>>>>
>>>>>> Den 22.01.2019 20.30, skrev Daniel Vetter:
>>>>>>> On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf at tronnes.org> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Den 22.01.2019 10.32, skrev Daniel Vetter:
>>>>>>>>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
>>>>>>>>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
>>>>>>>>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
>>>>>>>>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
>>>>>>>>>>>>> drm_dev_register().
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
>>>>>>>>>>>>> fbdev emulation as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> devm_drm_dev_register() isn't exported since there are no users.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Noralf Trønnes <noralf at tronnes.org>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <snip>
>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>>>>>>>>>>>>> index 381581b01d48..12129772be45 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
>>>>>>>>>>>>> @@ -36,6 +36,7 @@
>>>>>>>>>>>>>
>>>>>>>>>>>>>  #include <drm/drm_client.h>
>>>>>>>>>>>>>  #include <drm/drm_drv.h>
>>>>>>>>>>>>> +#include <drm/drm_fb_helper.h>
>>>>>>>>>>>>>  #include <drm/drmP.h>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  #include "drm_crtc_internal.h"
>>>>>>>>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
>>>>>>>>>>>>>  }
>>>>>>>>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
>>>>>>>>>>>>>
>>>>>>>>>>>>> +static void devm_drm_dev_init_release(void *data)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> + drm_dev_put(data);
>>>>>>>>>>>
>>>>>>>>>>> We need drm_dev_unplug() here, or this isn't safe.
>>>>>>>>>>
>>>>>>>>>> This function is only used to cover the error path if probe fails before
>>>>>>>>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
>>>>>>>>>> the one that calls unplug. There are comments about this in the functions.
>>>>>>>>>
>>>>>>>>> I think I get a prize for being ignorant and blind :-/
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> +}
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +/**
>>>>>>>>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
>>>>>>>>>>>>> + * @parent: Parent device object
>>>>>>>>>>>>> + * @dev: DRM device
>>>>>>>>>>>>> + * @driver: DRM driver
>>>>>>>>>>>>> + *
>>>>>>>>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
>>>>>>>>>>>>> + * automatically released on driver detach. You must supply a
>>>>>>>>>>>
>>>>>>>>>>> I think a bit more clarity here would be good:
>>>>>>>>>>>
>>>>>>>>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
>>>>>>>>>>>
>>>>>>>>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
>>>>>>>>>>>
>>>>>>>>>>> I think a loud warning for these is in order:
>>>>>>>>>>>
>>>>>>>>>>> "WARNING:
>>>>>>>>>>>
>>>>>>>>>>> "In generally it is unsafe to use devm functions for drm structures
>>>>>>>>>>> because the lifetimes of &drm_device and the underlying &device do not
>>>>>>>>>>> match. This here works because it doesn't immediately free anything, but
>>>>>>>>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
>>>>>>>>>>> refcount through drm_dev_put().
>>>>>>>>>>>
>>>>>>>>>>> "All other drm structures must still be explicitly released in the
>>>>>>>>>>> &drm_driver.release callback."
>>>>>>>>>>>
>>>>>>>>>>> While thinking about this I just realized that with this design we have no
>>>>>>>>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
>>>>>>>>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
>>>>>>>>>>> place to call it:
>>>>>>>>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
>>>>>>>>>>>   drm_dev_unregister in there must be called _before_ we start to shut
>>>>>>>>>>>   down anything.
>>>>>>>>>>> - drm_driver.release is way too late.
>>>>>>>>>>>
>>>>>>>>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
>>>>>>>>>>> already gone), but for a driver unload/unbind it would be nice if this
>>>>>>>>>>> happens automatically and in the right order.
>>>>>>>>>>>
>>>>>>>>>>> So not sure what to do here really.
>>>>>>>>>>
>>>>>>>>>> How about this change: (it breaks the rule of pulling helpers into the
>>>>>>>>>> core, so maybe we should put the devm_ functions into the simple KMS
>>>>>>>>>> helper instead?)
>>>>>>>>>
>>>>>>>>> Yeah smells a bit much like midlayer ... What would work is having a pile
>>>>>>>>> more devm_ helper functions, so that we onion-unwrap everything correctly,
>>>>>>>>> and in the right order. So:
>>>>>>>>>
>>>>>>>>> - devm_drm_dev_init (always does a drm_dev_put())
>>>>>>>>>
>>>>>>>>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
>>>>>>>>>
>>>>>>>>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
>>>>>>>>>
>>>>>>>>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
>>>>>>>>>   can call drm_dev_unplug() unconditionally).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Beautiful! I really like this, it's very flexible.
>>>>>>>>
>>>>>>>> Where should devm_drm_mode_config_reset() live? It will pull in the
>>>>>>>> atomic helper...
>>>>>>>
>>>>>>> I think a new drm_devm.c helper would be nice for all this stuff.
>>>>>>> Especially since you can't freely mix devm-based setup/cleanup with
>>>>>>> normal cleanup I think it'd be good to have it all together in one
>>>>>>> place. And perhaps even a code example in the DOC: overview.
>>>>>>>
>>>>>>>>> We'd need to make sure some of the cleanup actions dtrt when the device is
>>>>>>>>> gone, but I think we can achieve that by liberally sprinkling
>>>>>>>>> drm_dev_enter/exit over them, e.g. the the cleanup action for
>>>>>>>>> drm_mode_config_reset would be:
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>>       if (drm_dev_enter())
>>>>>>>>>               return;
>>>>>>>>>
>>>>>>>>>       drm_atomic_helper_shutdown();
>>>>>>>>>
>>>>>>>>>       drm_dev_exit();
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>
>>>>>>>> drm_dev_enter() can only be used to check whether the drm_device is
>>>>>>>> registered or not, it doesn't say anything about the state of the parent
>>>>>>>> device.
>>>>>>>>
>>>>>>>> All we know is that the device is being unbound from the driver, we
>>>>>>>> don't know if it's the device that's being removed or if it's the driver
>>>>>>>> that's unregistered.
>>>>>>>
>>>>>>> You're right, both paths will have called drm_dev_unplug by then.
>>>>>>> Silly me. I really liked my idea :-)
>>>>>>>
>>>>>>>> I have looked at the various call chains:
>>>>>>>>
>>>>>>>> driver_unregister ->
>>>>>>>>     bus_remove_driver ->
>>>>>>>>         driver_detach ->
>>>>>>>>             device_release_driver_internal
>>>>>>>>
>>>>>>>> device_unregister ->
>>>>>>>>     device_del ->
>>>>>>>>         bus_remove_device ->
>>>>>>>>             device_release_driver ->
>>>>>>>>                 device_release_driver_internal
>>>>>>>>
>>>>>>>> sysfs: unbind_store ->
>>>>>>>>     device_release_driver ->
>>>>>>>>         device_release_driver_internal
>>>>>>>>
>>>>>>>> The only way I've found to differentiate between these in a cleanup
>>>>>>>> action is that device_del() uses the bus notifier to signal
>>>>>>>> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
>>>>>>>> notifier could be used to set a drm_device->parent_removed flag.
>>>>>>>
>>>>>>> Hm, this might upset Greg KH's code taste ... maybe there's a better
>>>>>>> way to do this, but best to prototype a patch with this, send it to
>>>>>>> him and ask how to :-)
>>>>>>>
>>>>>>
>>>>>> I'll leave this to the one that needs it. The tinydrm drivers doesn't
>>>>>> need to touch hw after DRM unregister.
>>>>>>
>>>>>>>> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
>>>>>>>> everything get disabled when userspace closes? It does in my tinydrm
>>>>>>>> world :-)
>>>>>>>
>>>>>>> Iirc fbdev/fbcon can result in leaks ... at least we've had patches
>>>>>>> where drivers leaked drm_connector and drm_framebuffer objects, and
>>>>>>> they've been fixed by calling drm_atomic_helper_shutdown() in the
>>>>>>> unload path. Maybe this is cargo-culting, but it goes way back to
>>>>>>> pre-atomic, where drivers called drm_helper_force_disable_all().
>>>>>>>
>>>>>>> If you try to move the fbcon to your tinydrm drivers (con2fb is
>>>>>>> apparently the cmdline tool you need, never tried it, I only switch
>>>>>>> the kernel's console between fbcon and dummycon and back, not what
>>>>>>> fbcon drivers itself), then I think you should be able to reproduce.
>>>>>>> And maybe you have a better idea how to deal with this all.
>>>>>>>
>>>>>>> Note also that there's been proposals floating around to only close an
>>>>>>> drm_framebuffer, not also remove it (like the current RMFB ioctl
>>>>>>> does), with that closing userspace would not necessarily lead to a
>>>>>>> full cleanup.
>>>>>>>
>>>>>>> Another thing (which doesn't apply to drm_simple_display_pipe drivers)
>>>>>>> is if you have the display on, but no planes showing (i.e. all black).
>>>>>>> Then all the fbs will be cleaned up, but drm_connector will be
>>>>>>> leaking. That's a case where you need drm_atomic_helper_shutdown()
>>>>>>> even if fbcon/fbdev isn't even enabled.
>>>>>>>
>>>>>>
>>>>>> Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
>>>>>> tinydrm. DRM userspace disables the pipe on close and the generic fbdev
>>>>>> emulation also releases everything.
>>>>>> Even so, maybe I should use devm_drm_mode_config_reset() after all to
>>>>>> keep drivers uniform, to avoid confusion: why doesn't he use it?
>>>>>
>>>>> Hm maybe there is an official way to solve this, pulling in Greg+Rafael.
>>>>>
>>>>> Super short summary: We want to start using devm actions to clean up drm
>>>>> drivers. Here's the problem:
>>>>> - For a driver unload/unbind without hotunplug, we want to properly clean
>>>>>   up the hardware and shut it all down.
>>>>
>>>> Then do it on probe/disconnect.
>>>>
>>>>> - But if the device is unplugged already, that's probably not the best
>>>>>   idea, and we only want to clean up the kernel's resources/allocations.
>>>>
>>>> Again, probe/disconnect will be called either way.
>>>>
>>>> But as you note, memory will NOT be freed by the devm stuff if you
>>>> manually unbind a driver from a device.
>>>>
>>>> So don't touch hardware there, it's not going to work :)
>>>>
>>>>> What's the recommendation here? I see a few options:
>>>>>
>>>>> - Make sure everything can deal with this properly. Hotunplug can happen
>>>>>   anytime, so there's a race no matter what.
>>>>
>>>> Yes.
>>>>
>>>>> - Check with the device model whether the struct device is disappearing or
>>>>>   whether we're just dealing with a driver unbind (no idea how to do
>>>>>   that), and act accordingly.
>>>>
>>>> You don't know that, sorry.  Just do any hardware stuff on disconnect.
>>>> Assuming your hardware is still present :)
>>>>
>>>>> - Fundamental question: Touching the hw from devm actions, is that ok? If
>>>>>   not, then the pretty nifty plan laid out in this thread wont work.
>>>>
>>>> Nope, that's not going to work, the device could either be long gone, or
>>>> you will not be called due to unbind happening from userspace.
>>>>
>>>> But really, unbind from userspace is very very rare, it's a developer
>>>> thing mostly.  Oh and a virtual driver thing, but those people are crazy
>>>> :)
>>>>
>>>>> - Something completely different?
>>>>
>>>> Do it in disconnect :)
>>>
>>> Ah, I forgot to mention the important constraint :-) disconnect/unbind
>>> should be the following sequence:
>>>
>>> 1. Unregister all the userspace interfaces (there's a lot of them) and
>>> make sure all the pending ioctls are done so that from now on
>>> userspace sees lots of -EIO (in case it still has fd open, which is
>>> going to be the normal for hotunplug.
>>>
>>> 2. Shut down hw and all ongoing operations (only relevant for unbind,
>>> but needs to be able to cope with sudden hotunplug on top anyway).
>>>
>>> 3. Clean up the kernel mess and release everything.
>>>
>>> Probe is exactly the other way round, so would perfectly fit into the
>>> devm onion cleanup. See in the commented earlier replies above how
>>> that would match in details, but tldr; if we have to do 2. in
>>> disconnect, then we also have to do 1. in disconnected, and only doing
>>> 3. through devm is almost not worth the bother. But if we could do all
>>> three through devm then simple drivers wouldn't even need any
>>> disconnect/unbind callback at all. That's our motivation for trying to
>>> come up with an answer that's not "do it in disconnect". "do it in
>>> disconnect" is how we do it all today already.
>>>
>>> Yes we're trying to make tiny drivers even smaller, we have enough
>>> nowadays that this stuff would be worth it :-)
>>>
>>
>> I think a solution is to say that drivers that want to touch hw on
>> disconnect needs to use device_driver->remove to do that.
>>
>> This is an example driver that doesn't need to touch hw because it's so
>> simple that userspace has disabled the pipeline:
>>
>> static void drm_driver_release(struct drm_device *drm)
>> {
>> 	drm_mode_config_cleanup(drm);
>> 	drm_dev_fini(drm);
>> 	kfree(drm);
>> }
>>
>> static struct drm_driver drm_driver = {
>> 	.release = drm_driver_release,
>> 	/* ... */
>> };
>>
>> static int driver_probe(struct device *dev)
>> {
>> 	struct drm_device *drm;
>> 	int ret;
>>
>> 	drm = kzalloc(sizeof(*drm), GFP_KERNEL);
>> 	if (!drm)
>> 		return -ENOMEM;
>>
>> 	ret = devm_drm_dev_init(dev, drm, &drm_driver);
>> 	if (ret) {
>> 		kfree(drm);
>> 		return ret;
>> 	}
>>
>> 	drm_mode_config_init(drm);
>>
>> 	/* Aquire various resources, all managed by devres */
>>
>> 	drm_mode_config_reset(drm);
>>
>> 	return devm_drm_dev_register(drm);
>> }
>>
>> struct device_driver driver = {
>> 	.probe = driver_probe,
>> };
>>
>>
>> A driver that wants to touch hardware on disconnect, can look like this:
>>
>> static void drm_driver_release(struct drm_device *drm)
>> {
>> 	drm_mode_config_cleanup(drm);
>> 	drm_dev_fini(drm);
>> 	kfree(drm);
>> }
>>
>> static struct drm_driver drm_driver = {
>> 	.release = drm_driver_release,
>> 	/* ... */
>> };
>>
>> static int driver_probe(struct device *dev)
>> {
>> 	struct drm_device *drm;
>> 	int ret;
>>
>> 	drm = kzalloc(sizeof(*drm), GFP_KERNEL);
>> 	if (!drm)
>> 		return -ENOMEM;
>>
>> 	ret = devm_drm_dev_init(dev, drm, &drm_driver);
>> 	if (ret) {
>> 		kfree(drm);
>> 		return ret;
>> 	}
>>
>> 	drm_mode_config_init(drm);
>>
>> 	/* Aquire various resources, all managed by devres */
>>
>> 	drm_mode_config_reset(drm);
>>
>> 	ret = drm_dev_register(drm);
>> 	if (ret)
>> 		return ret;
>>
>> 	drm_dev_get(dev); /* If using drm_dev_unplug() */
>>
>> 	dev_set_drvdata(dev, drm);
>>
>> 	return 0;
>> }
>>
>> /* This function is called before devres_release_all() */
>> static int driver_remove(struct device *dev)
>> {
>> 	struct drm_device *drm = dev_get_drvdata(dev);
>>
>> 	drm_dev_unplug(drm); OR drm_dev_unregister(drm);
>> 	drm_atomic_helper_shutdown(drm)
>>
>> 	return 0;
>> }
>>
>> struct device_driver driver = {
>> 	.probe = driver_probe,
>> 	.remove = driver_remove,
> 
> That's exactly the pattern I'm trying to avoid, because imo your tiny
> driver _also_ should do this. Or we realize that all the current drivers
> doing drm_atomic_helper_shutdown are misguided, but I'm not really
> understanding why.
> 

I don't know where my head has been, the pipeline on tinydrm isn't
disabled on driver module unload. I've so preoccupied with ensuring that
device removal is working that I must have had some kind of tunnel vision.

The problem is that fbdev is restored on lastclose regardless of it
being in use or not.

I tried to address it in this patch:

drm/fb-helper/generic: Only restore when in use
https://patchwork.freedesktop.org/patch/263271/

> Having a devm helper which cannot be used for some drivers due to
> fundamental design issues is kinda not great, because it means everyone
> will still use it, and shrug the bugs off as "not my problem". Which is
> what's happening right now with all the devm_kzalloc we have in drm
> drivers that all the get lifetim wrong. But because devm_ is convenient
> everyone uses it, so the driver unload of most drivers is full of bugs.

Yes, it's starting to look like this devm idea isn't such a great thing
after all.

And it looks like dropping DRM devm can still give a slim driver:

static int driver_probe(struct device *dev)
{
	struct drm_device *drm;
	int ret;

	drm = kzalloc(sizeof(*drm), GFP_KERNEL);
	if (!drm)
		return -ENOMEM;

	ret = drm_dev_init(dev, drm, &drm_driver);
	if (ret) {
		kfree(drm);
		return ret;
	}

	drm_mode_config_init(drm);

	/*
	 * Aquire various resources, all managed by devres
	 * or being released in drm_driver->release.
	 * goto err_put on failure
	 */

	drm_mode_config_reset(drm);

	ret = drm_dev_register(drm);
	if (ret)
		goto err_put;

	dev_set_drvdata(dev, drm);

	return 0;

err_put:
	drm_dev_put(dev);

	return ret
}

static int driver_remove(struct device *dev)
{
	struct drm_device *drm = dev_get_drvdata(dev);

	drm_atomic_helper_shutdown(drm)
	drm_dev_unplug(drm);

	return 0;
}

Noralf.


More information about the dri-devel mailing list