devm actions and hw clenaup (was Re: [PATCH 01/11] drm: Add devm_drm_dev_init/register)

Greg KH gregkh at linuxfoundation.org
Tue Jan 29 15:16:12 UTC 2019


On Tue, Jan 29, 2019 at 03:34:46PM +0100, Noralf Trønnes wrote:
> 
> 
> Den 24.01.2019 18.57, skrev Daniel Vetter:
> > On Thu, Jan 24, 2019 at 6:46 PM Greg KH <gregkh at linuxfoundation.org> wrote:
> >>
> >> On Thu, Jan 24, 2019 at 11:43:12AM +0100, Daniel Vetter wrote:
> >>> On Wed, Jan 23, 2019 at 11:54:07AM +0100, Noralf Trønnes wrote:
> >>>>
> >>>>
> >>>> Den 22.01.2019 20.30, skrev Daniel Vetter:
> >>>>> On Tue, Jan 22, 2019 at 8:07 PM Noralf Trønnes <noralf at tronnes.org> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Den 22.01.2019 10.32, skrev Daniel Vetter:
> >>>>>>> On Mon, Jan 21, 2019 at 01:21:46PM +0100, Noralf Trønnes wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Den 21.01.2019 10.55, skrev Daniel Vetter:
> >>>>>>>>> On Mon, Jan 21, 2019 at 10:10:14AM +0100, Daniel Vetter wrote:
> >>>>>>>>>> On Sun, Jan 20, 2019 at 12:43:08PM +0100, Noralf Trønnes wrote:
> >>>>>>>>>>> This adds resource managed (devres) versions of drm_dev_init() and
> >>>>>>>>>>> drm_dev_register().
> >>>>>>>>>>>
> >>>>>>>>>>> Also added is devm_drm_dev_register_with_fbdev() which sets up generic
> >>>>>>>>>>> fbdev emulation as well.
> >>>>>>>>>>>
> >>>>>>>>>>> devm_drm_dev_register() isn't exported since there are no users.
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Noralf Trønnes <noralf at tronnes.org>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>> <snip>
> >>>>>>>>
> >>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> >>>>>>>>>>> index 381581b01d48..12129772be45 100644
> >>>>>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
> >>>>>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
> >>>>>>>>>>> @@ -36,6 +36,7 @@
> >>>>>>>>>>>
> >>>>>>>>>>>  #include <drm/drm_client.h>
> >>>>>>>>>>>  #include <drm/drm_drv.h>
> >>>>>>>>>>> +#include <drm/drm_fb_helper.h>
> >>>>>>>>>>>  #include <drm/drmP.h>
> >>>>>>>>>>>
> >>>>>>>>>>>  #include "drm_crtc_internal.h"
> >>>>>>>>>>> @@ -871,6 +872,111 @@ void drm_dev_unregister(struct drm_device *dev)
> >>>>>>>>>>>  }
> >>>>>>>>>>>  EXPORT_SYMBOL(drm_dev_unregister);
> >>>>>>>>>>>
> >>>>>>>>>>> +static void devm_drm_dev_init_release(void *data)
> >>>>>>>>>>> +{
> >>>>>>>>>>> + drm_dev_put(data);
> >>>>>>>>>
> >>>>>>>>> We need drm_dev_unplug() here, or this isn't safe.
> >>>>>>>>
> >>>>>>>> This function is only used to cover the error path if probe fails before
> >>>>>>>> devm_drm_dev_register() is called. devm_drm_dev_register_release() is
> >>>>>>>> the one that calls unplug. There are comments about this in the functions.
> >>>>>>>
> >>>>>>> I think I get a prize for being ignorant and blind :-/
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>> +}
> >>>>>>>>>>> +
> >>>>>>>>>>> +/**
> >>>>>>>>>>> + * devm_drm_dev_init - Resource managed drm_dev_init()
> >>>>>>>>>>> + * @parent: Parent device object
> >>>>>>>>>>> + * @dev: DRM device
> >>>>>>>>>>> + * @driver: DRM driver
> >>>>>>>>>>> + *
> >>>>>>>>>>> + * Managed drm_dev_init(). The DRM device initialized with this function is
> >>>>>>>>>>> + * automatically released on driver detach. You must supply a
> >>>>>>>>>
> >>>>>>>>> I think a bit more clarity here would be good:
> >>>>>>>>>
> >>>>>>>>> "... automatically released on driver unbind by callind drm_dev_unplug()."
> >>>>>>>>>
> >>>>>>>>>>> + * &drm_driver.release callback to control the finalization explicitly.
> >>>>>>>>>
> >>>>>>>>> I think a loud warning for these is in order:
> >>>>>>>>>
> >>>>>>>>> "WARNING:
> >>>>>>>>>
> >>>>>>>>> "In generally it is unsafe to use devm functions for drm structures
> >>>>>>>>> because the lifetimes of &drm_device and the underlying &device do not
> >>>>>>>>> match. This here works because it doesn't immediately free anything, but
> >>>>>>>>> only calls drm_dev_unplug(), which internally decrements the &drm_device
> >>>>>>>>> refcount through drm_dev_put().
> >>>>>>>>>
> >>>>>>>>> "All other drm structures must still be explicitly released in the
> >>>>>>>>> &drm_driver.release callback."
> >>>>>>>>>
> >>>>>>>>> While thinking about this I just realized that with this design we have no
> >>>>>>>>> good place to call drm_atomic_helper_shutdown(). Which we need to, or all
> >>>>>>>>> kinds of things will leak badly (connectors, fb, ...), but there's no
> >>>>>>>>> place to call it:
> >>>>>>>>> - unbind is too early, since we haven't yet called drm_dev_unplug, and the
> >>>>>>>>>   drm_dev_unregister in there must be called _before_ we start to shut
> >>>>>>>>>   down anything.
> >>>>>>>>> - drm_driver.release is way too late.
> >>>>>>>>>
> >>>>>>>>> Ofc for a real hotunplug there's no point in shutting down the hw (it's
> >>>>>>>>> already gone), but for a driver unload/unbind it would be nice if this
> >>>>>>>>> happens automatically and in the right order.
> >>>>>>>>>
> >>>>>>>>> So not sure what to do here really.
> >>>>>>>>
> >>>>>>>> How about this change: (it breaks the rule of pulling helpers into the
> >>>>>>>> core, so maybe we should put the devm_ functions into the simple KMS
> >>>>>>>> helper instead?)
> >>>>>>>
> >>>>>>> Yeah smells a bit much like midlayer ... What would work is having a pile
> >>>>>>> more devm_ helper functions, so that we onion-unwrap everything correctly,
> >>>>>>> and in the right order. So:
> >>>>>>>
> >>>>>>> - devm_drm_dev_init (always does a drm_dev_put())
> >>>>>>>
> >>>>>>> - devm_drm_poll_enable (shuts down the poll helper with a devm action)
> >>>>>>>
> >>>>>>> - devm_drm_mode_config_reset (does an atomic_helper_shutdown() as it's cleanup action)
> >>>>>>>
> >>>>>>> - devm_drm_dev_register (grabs an additional drm_dev_get() reference so it
> >>>>>>>   can call drm_dev_unplug() unconditionally).
> >>>>>>>
> >>>>>>
> >>>>>> Beautiful! I really like this, it's very flexible.
> >>>>>>
> >>>>>> Where should devm_drm_mode_config_reset() live? It will pull in the
> >>>>>> atomic helper...
> >>>>>
> >>>>> I think a new drm_devm.c helper would be nice for all this stuff.
> >>>>> Especially since you can't freely mix devm-based setup/cleanup with
> >>>>> normal cleanup I think it'd be good to have it all together in one
> >>>>> place. And perhaps even a code example in the DOC: overview.
> >>>>>
> >>>>>>> We'd need to make sure some of the cleanup actions dtrt when the device is
> >>>>>>> gone, but I think we can achieve that by liberally sprinkling
> >>>>>>> drm_dev_enter/exit over them, e.g. the the cleanup action for
> >>>>>>> drm_mode_config_reset would be:
> >>>>>>>
> >>>>>>> {
> >>>>>>>       if (drm_dev_enter())
> >>>>>>>               return;
> >>>>>>>
> >>>>>>>       drm_atomic_helper_shutdown();
> >>>>>>>
> >>>>>>>       drm_dev_exit();
> >>>>>>> }
> >>>>>>>
> >>>>>>
> >>>>>> drm_dev_enter() can only be used to check whether the drm_device is
> >>>>>> registered or not, it doesn't say anything about the state of the parent
> >>>>>> device.
> >>>>>>
> >>>>>> All we know is that the device is being unbound from the driver, we
> >>>>>> don't know if it's the device that's being removed or if it's the driver
> >>>>>> that's unregistered.
> >>>>>
> >>>>> You're right, both paths will have called drm_dev_unplug by then.
> >>>>> Silly me. I really liked my idea :-)
> >>>>>
> >>>>>> I have looked at the various call chains:
> >>>>>>
> >>>>>> driver_unregister ->
> >>>>>>     bus_remove_driver ->
> >>>>>>         driver_detach ->
> >>>>>>             device_release_driver_internal
> >>>>>>
> >>>>>> device_unregister ->
> >>>>>>     device_del ->
> >>>>>>         bus_remove_device ->
> >>>>>>             device_release_driver ->
> >>>>>>                 device_release_driver_internal
> >>>>>>
> >>>>>> sysfs: unbind_store ->
> >>>>>>     device_release_driver ->
> >>>>>>         device_release_driver_internal
> >>>>>>
> >>>>>> The only way I've found to differentiate between these in a cleanup
> >>>>>> action is that device_del() uses the bus notifier to signal
> >>>>>> BUS_NOTIFY_DEL_DEVICE before calling bus_remove_device(). Such a
> >>>>>> notifier could be used to set a drm_device->parent_removed flag.
> >>>>>
> >>>>> Hm, this might upset Greg KH's code taste ... maybe there's a better
> >>>>> way to do this, but best to prototype a patch with this, send it to
> >>>>> him and ask how to :-)
> >>>>>
> >>>>
> >>>> I'll leave this to the one that needs it. The tinydrm drivers doesn't
> >>>> need to touch hw after DRM unregister.
> >>>>
> >>>>>> Why is it necessary to call drm_atomic_helper_shutdown() here? Doesn't
> >>>>>> everything get disabled when userspace closes? It does in my tinydrm
> >>>>>> world :-)
> >>>>>
> >>>>> Iirc fbdev/fbcon can result in leaks ... at least we've had patches
> >>>>> where drivers leaked drm_connector and drm_framebuffer objects, and
> >>>>> they've been fixed by calling drm_atomic_helper_shutdown() in the
> >>>>> unload path. Maybe this is cargo-culting, but it goes way back to
> >>>>> pre-atomic, where drivers called drm_helper_force_disable_all().
> >>>>>
> >>>>> If you try to move the fbcon to your tinydrm drivers (con2fb is
> >>>>> apparently the cmdline tool you need, never tried it, I only switch
> >>>>> the kernel's console between fbcon and dummycon and back, not what
> >>>>> fbcon drivers itself), then I think you should be able to reproduce.
> >>>>> And maybe you have a better idea how to deal with this all.
> >>>>>
> >>>>> Note also that there's been proposals floating around to only close an
> >>>>> drm_framebuffer, not also remove it (like the current RMFB ioctl
> >>>>> does), with that closing userspace would not necessarily lead to a
> >>>>> full cleanup.
> >>>>>
> >>>>> Another thing (which doesn't apply to drm_simple_display_pipe drivers)
> >>>>> is if you have the display on, but no planes showing (i.e. all black).
> >>>>> Then all the fbs will be cleaned up, but drm_connector will be
> >>>>> leaking. That's a case where you need drm_atomic_helper_shutdown()
> >>>>> even if fbcon/fbdev isn't even enabled.
> >>>>>
> >>>>
> >>>> Ok, this means that I don't need to call drm_atomic_helper_shutdown() in
> >>>> tinydrm. DRM userspace disables the pipe on close and the generic fbdev
> >>>> emulation also releases everything.
> >>>> Even so, maybe I should use devm_drm_mode_config_reset() after all to
> >>>> keep drivers uniform, to avoid confusion: why doesn't he use it?
> >>>
> >>> Hm maybe there is an official way to solve this, pulling in Greg+Rafael.
> >>>
> >>> Super short summary: We want to start using devm actions to clean up drm
> >>> drivers. Here's the problem:
> >>> - For a driver unload/unbind without hotunplug, we want to properly clean
> >>>   up the hardware and shut it all down.
> >>
> >> Then do it on probe/disconnect.
> >>
> >>> - But if the device is unplugged already, that's probably not the best
> >>>   idea, and we only want to clean up the kernel's resources/allocations.
> >>
> >> Again, probe/disconnect will be called either way.
> >>
> >> But as you note, memory will NOT be freed by the devm stuff if you
> >> manually unbind a driver from a device.
> >>
> >> So don't touch hardware there, it's not going to work :)
> >>
> >>> What's the recommendation here? I see a few options:
> >>>
> >>> - Make sure everything can deal with this properly. Hotunplug can happen
> >>>   anytime, so there's a race no matter what.
> >>
> >> Yes.
> >>
> >>> - Check with the device model whether the struct device is disappearing or
> >>>   whether we're just dealing with a driver unbind (no idea how to do
> >>>   that), and act accordingly.
> >>
> >> You don't know that, sorry.  Just do any hardware stuff on disconnect.
> >> Assuming your hardware is still present :)
> >>
> >>> - Fundamental question: Touching the hw from devm actions, is that ok? If
> >>>   not, then the pretty nifty plan laid out in this thread wont work.
> >>
> >> Nope, that's not going to work, the device could either be long gone, or
> >> you will not be called due to unbind happening from userspace.
> >>
> >> But really, unbind from userspace is very very rare, it's a developer
> >> thing mostly.  Oh and a virtual driver thing, but those people are crazy
> >> :)
> >>
> >>> - Something completely different?
> >>
> >> Do it in disconnect :)
> > 
> > Ah, I forgot to mention the important constraint :-) disconnect/unbind
> > should be the following sequence:
> > 
> > 1. Unregister all the userspace interfaces (there's a lot of them) and
> > make sure all the pending ioctls are done so that from now on
> > userspace sees lots of -EIO (in case it still has fd open, which is
> > going to be the normal for hotunplug.
> > 
> > 2. Shut down hw and all ongoing operations (only relevant for unbind,
> > but needs to be able to cope with sudden hotunplug on top anyway).
> > 
> > 3. Clean up the kernel mess and release everything.
> > 
> > Probe is exactly the other way round, so would perfectly fit into the
> > devm onion cleanup. See in the commented earlier replies above how
> > that would match in details, but tldr; if we have to do 2. in
> > disconnect, then we also have to do 1. in disconnected, and only doing
> > 3. through devm is almost not worth the bother. But if we could do all
> > three through devm then simple drivers wouldn't even need any
> > disconnect/unbind callback at all. That's our motivation for trying to
> > come up with an answer that's not "do it in disconnect". "do it in
> > disconnect" is how we do it all today already.
> > 
> > Yes we're trying to make tiny drivers even smaller, we have enough
> > nowadays that this stuff would be worth it :-)
> > 
> 
> I think a solution is to say that drivers that want to touch hw on
> disconnect needs to use device_driver->remove to do that.

That's what I was trying to say above "remove" is what I was trying to
say is "disconnect".

thanks,

greg k-h


More information about the dri-devel mailing list