[Intel-gfx] [PATCH 2/6] drm/drv: Prepare to remove drm_dev_unplug()
Daniel Vetter
daniel at ffwll.ch
Wed Feb 6 20:36:05 UTC 2019
On Wed, Feb 06, 2019 at 05:46:51PM +0100, Noralf Trønnes wrote:
>
>
> Den 06.02.2019 16.26, skrev Daniel Vetter:
> > On Tue, Feb 05, 2019 at 06:57:50PM +0100, Noralf Trønnes wrote:
> >>
> >>
> >> Den 05.02.2019 17.31, skrev Daniel Vetter:
> >>> On Tue, Feb 05, 2019 at 11:20:55AM +0100, Noralf Trønnes wrote:
> >>>>
> >>>>
> >>>> Den 05.02.2019 10.11, skrev Daniel Vetter:
> >>>>> On Mon, Feb 04, 2019 at 06:35:28PM +0100, Noralf Trønnes wrote:
> >>>>>>
> >>>>>>
> >>>>>> Den 04.02.2019 16.41, skrev Daniel Vetter:
> >>>>>>> On Sun, Feb 03, 2019 at 04:41:56PM +0100, Noralf Trønnes wrote:
> >>>>>>>> The only thing now that makes drm_dev_unplug() special is that it sets
> >>>>>>>> drm_device->unplugged. Move this code to drm_dev_unregister() so that we
> >>>>>>>> can remove drm_dev_unplug().
> >>>>>>>>
> >>>>>>>> Signed-off-by: Noralf Trønnes <noralf at tronnes.org>
> >>>>>>>> ---
> >>>>>>
> >>>>>> [...]
> >>>>>>
> >>>>>>>> drivers/gpu/drm/drm_drv.c | 27 +++++++++++++++------------
> >>>>>>>> include/drm/drm_drv.h | 10 ++++------
> >>>>>>>> 2 files changed, 19 insertions(+), 18 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> >>>>>>>> index 05bbc2b622fc..e0941200edc6 100644
> >>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
> >>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
> >>>>>>>> @@ -366,15 +366,6 @@ EXPORT_SYMBOL(drm_dev_exit);
> >>>>>>>> */
> >>>>>>>> void drm_dev_unplug(struct drm_device *dev)
> >>>>>>>> {
> >>>>>>>> - /*
> >>>>>>>> - * After synchronizing any critical read section is guaranteed to see
> >>>>>>>> - * the new value of ->unplugged, and any critical section which might
> >>>>>>>> - * still have seen the old value of ->unplugged is guaranteed to have
> >>>>>>>> - * finished.
> >>>>>>>> - */
> >>>>>>>> - dev->unplugged = true;
> >>>>>>>> - synchronize_srcu(&drm_unplug_srcu);
> >>>>>>>> -
> >>>>>>>> drm_dev_unregister(dev);
> >>>>>>>> drm_dev_put(dev);
> >>>>>>>> }
> >>>>>>>> @@ -832,11 +823,14 @@ EXPORT_SYMBOL(drm_dev_register);
> >>>>>>>> * drm_dev_register() but does not deallocate the device. The caller must call
> >>>>>>>> * drm_dev_put() to drop their final reference.
> >>>>>>>> *
> >>>>>>>> - * A special form of unregistering for hotpluggable devices is drm_dev_unplug(),
> >>>>>>>> - * which can be called while there are still open users of @dev.
> >>>>>>>> + * This function can be called while there are still open users of @dev as long
> >>>>>>>> + * as the driver protects its device resources using drm_dev_enter() and
> >>>>>>>> + * drm_dev_exit().
> >>>>>>>> *
> >>>>>>>> * This should be called first in the device teardown code to make sure
> >>>>>>>> - * userspace can't access the device instance any more.
> >>>>>>>> + * userspace can't access the device instance any more. Drivers that support
> >>>>>>>> + * device unplug will probably want to call drm_atomic_helper_shutdown() first
> >>>>>>>
> >>>>>>> Read once more with a bit more coffee, spotted this:
> >>>>>>>
> >>>>>>> s/first/afterwards/ - shutting down the hw before we've taken it away from
> >>>>>>> userspace is kinda the wrong way round. It should be the inverse of driver
> >>>>>>> load, which is 1) allocate structures 2) prep hw 3) register driver with
> >>>>>>> the world (simplified ofc).
> >>>>>>>
> >>>>>>
> >>>>>> The problem is that drm_dev_unregister() sets the device as unplugged
> >>>>>> and if drm_atomic_helper_shutdown() is called afterwards it's not
> >>>>>> allowed to touch hardware.
> >>>>>>
> >>>>>> I know it's the wrong order, but the only way to do it in the right
> >>>>>> order is to have a separate function that sets unplugged:
> >>>>>>
> >>>>>> drm_dev_unregister();
> >>>>>> drm_atomic_helper_shutdown();
> >>>>>> drm_dev_set_unplugged();
> >>>>>
> >>>>> Annoying ... but yeah calling _shutdown() before we stopped userspace is
> >>>>> also not going to work. Because userspace could quickly re-enable
> >>>>> something, and then the refcounts would be all wrong again and leaking
> >>>>> objects.
> >>>>>
> >>>>
> >>>> What happens with a USB device that is unplugged with open userspace,
> >>>> will that leak objects?
> >>>
> >>> Maybe we've jumped to conclusions. drm_atomic_helper_shutdown() will run
> >>> as normal, the only thing that should be skipped is actually touching the
> >>> hw (as long as the driver doesn't protect too much with
> >>> drm_dev_enter/exit). So all the software updates (including refcounting
> >>> updates) will still be done. Ofc current udl is not yet atomic, so in
> >>> reality something else happens.
> >>>
> >>> And we ofc still have the same issue that if you just unload the driver,
> >>> then the hw will stay on (which might really confuse the driver on next
> >>> load, when it assumes that it only gets loaded from cold boot where
> >>> everything is off - which usually is the case on an arm soc at least).
> >>>
> >>>>> I get a bit the feeling we're over-optimizing here with trying to devm-ize
> >>>>> drm_dev_register. Just getting drm_device correctly devm-ized is a big
> >>>>> step forward already, and will open up a lot of TODO items across a lot of
> >>>>> drivers. E.g. we could add a drm_dev_kzalloc, for allocating all the drm_*
> >>>>> structs, which gets released together with drm_device. I think that's a
> >>>>> much clearer path forward, I think we all agree that getting the kfree out
> >>>>> of the driver codes is a good thing, and it would allow us to do this
> >>>>> correctly.
> >>>>>
> >>>>> Then once we have that and rolled out to a few drivers we can reconsider
> >>>>> the entire unregister/shutdown gordian knot here. Atm I just have no idea
> >>>>> how to do this properly :-/
> >>>>>
> >>>>> Thoughts, other ideas?
> >>>>>
> >>>>
> >>>> Yeah, I've come to the conclusion that devm_drm_dev_register() doesn't
> >>>> make much sense if we need a driver remove callback anyways.
> >>>
> >>> Yup, that's what I meant with the above:
> >>> - merge devm_drm_dev_register, use it a lot. This is definitely a good
> >>> idea.
> >>> - create a drm_dev_kzalloc, which automatically releases memory on the
> >>> final drm_dev_put. Use it every in drivers for drm structures,
> >>> especially in those drivers that currently use devm (which releases
> >>> those allocations potentialy too early on unplug).
> >>> - figure out the next steps after that
> >>>
> >>>> I think devm_drm_dev_init() makes sense because it yields a cleaner
> >>>> probe() function. An additional benefit is that it requires a
> >>>> drm_driver->release function which is a step in the right direction to
> >>>> get the drm_device lifetime right.
> >>>>
> >>>> Do we agree that a drm_dev_set_unplugged() function is necessary to get
> >>>> the remove/disconnect order right?
> >>>>
> >>>> What about drm_dev_unplug() maybe I should just leave it be?
> >>>>
> >>>> - amd uses drm_driver->unload, so that one takes some work to get right
> >>>> to support unplug. It doesn't check the unplugged state, so really
> >>>> doesn't need drm_dev_unplug() I guess. Do they have cards that can be
> >>>> hotplugged?
> >>>
> >>> Yeah amd still uses ->load and ->unload, which is not great unfortunately.
> >>> I just stumbled over that when I tried to make a series to disable the
> >>> global drm_global_mutex for most drivers ...
> >>>
> >>>> - udl uses drm_driver->unload, doesn't use drm_atomic_helper_shutdown().
> >>>> It has only one drm_dev_is_unplugged() check and that is in its
> >>>> fbdev->open callback.
> >>>
> >>> udl isn't atomic, so can't use the atomic helpers. pre-atomic doesn't have
> >>> refcounting issues when something is left on iirc. I think udl is written
> >>> under the assumption that ->unload is always called for an unplug, never
> >>> for an actual unload.
> >>>
> >>>> - xen calls drm_atomic_helper_shutdown() in its drm_driver->release
> >>>> callback which I guess is not correct.
> >>>
> >>> Yeah this smells fishy. ->release is supposed to be for cleaning up kernel
> >>> structures, not for cleaning up the hw. So maybe drm_mode_config_cleanup
> >>> could be put there, not sure honestly. But definitely not _shutdown().
> >>>
> >>>> This is what it will look like with a set unplugged function:
> >>>>
> >>>> void drm_dev_unplug(struct drm_device *dev)
> >>>> {
> >>>> drm_dev_set_unplugged(dev);
> >>>> drm_dev_unregister(dev);
> >>>> drm_dev_put(dev);
> >>>> }
> >>>>
> >>>> Hm, I should probably remove it to avoid further use of it since it's
> >>>> not correct for a modern driver.
> >>>
> >>> I think something flew over my head ... what's the drm_dev_set_unplugged
> >>> for? I think I'm not following you ...
> >>>
> >>
> >> Taking it a few steps back:
> >>
> >> This patch proposes to move 'dev->unplugged = true' from
> >> drm_dev_unplug() to drm_dev_unregister().
> >>
> >> Additionally I proposed this change to the drm_dev_unregister() docs:
> >>
> >> * This should be called first in the device teardown code to make sure
> >> - * userspace can't access the device instance any more.
> >> + * userspace can't access the device instance any more. Drivers that
> >> support
> >> + * device unplug will probably want to call
> >> drm_atomic_helper_shutdown() first
> >> + * in order to disable the hardware on regular driver module unload.
> >>
> >> Which would give a driver remove callback like this:
> >>
> >> static int driver_remove()
> >> {
> >> drm_atomic_helper_shutdown();
> >> drm_dev_unregister();
> >> }
> >>
> >> Your reaction was that drm_atomic_helper_shutdown() needs to be called
> >> after drm_dev_unregister() to avoid a race resulting in leaked objects.
> >> However if we call it afterwards, ->unplugged will be true and the
> >> driver can't touch hardware.
> >>
> >> Then I proposed moving the unplugged state setting to:
> >>
> >> void drm_dev_set_unplugged(struct drm_device *dev)
> >> {
> >> dev->unplugged = true;
> >> synchronize_srcu(&drm_unplug_srcu);
> >> }
> >>
> >> Now it is possible to avoid the race and still touch hardware:
> >>
> >> static int driver_remove()
> >> {
> >> drm_dev_unregister();
> >> drm_atomic_helper_shutdown();
> >> drm_dev_set_unplugged();
> >> }
> >>
> >> But now I'm back to the question: Is it the driver or the device that is
> >> going away?
> >>
> >> If it's the driver we are fine touching hardware, if it's the device it
> >> depends on how we access the device resource and whether the subsystem
> >> has protection in place to handle access after the device is gone. I
> >> think USB can handle and block device access up until the disconnect
> >> callback has finished (no point in doing so though, since the normal
> >> operation is that the device is gone, not the driver unloading).
> >>
> >> Is there a way to determine who's going away without changes to the
> >> device core?
> >>
> >> Maybe. The driver can only be unregistered if there are no open file
> >> handles because a ref is taken on the driver module.
> >
> > This isn't true. You can (not many people do, but it's possible) to unbind
> > through sysfs. The module reference only keeps the cpu instructions valid,
> > nothing else.
> >
> >> So maybe something along these lines:
> >>
> >> int drm_dev_open_count(struct drm_device *dev)
> >> {
> >> int open_count;
> >>
> >> mutex_lock(&drm_global_mutex);
> >> open_count = dev->open_count;
> >> mutex_unlock(&drm_global_mutex);
> >
> > Random style nit: READ_ONCE, no locking needed (the locks don't change
> > anything, except if you have really strange locking rules). Serves
> > double-duty as a huge warning sign that something tricky is happening.
> >
> >> return open_count;
> >> }
> >>
> >> static int driver_remove()
> >> {
> >> drm_dev_unregister();
> >>
> >> open_count = drm_dev_open_count();
> >>
> >> /* Open fd's, device is going away, block access */
> >> if (open_count)
> >> drm_dev_set_unplugged();
> >>
> >> drm_atomic_helper_shutdown();
> >>
> >> /* No open fd's, driver is going away */
> >> if (!open_count)
> >> drm_dev_set_unplugged();
> >> }
> >>
> >>
> >> Personally I have 2 use cases:
> >> - tinydrm SPI drivers
> >> The only hotpluggable SPI controllers I know of are USB which should
> >> handle device access while unregistering.
> >> SPI devices can be removed, but the controller driver is still in
> >> place so it's safe.
> >> - A future USB driver (that I hope to work on when all this core stuff
> >> is in place).
> >> There's no point in touching the hw, so DRM can be set uplugged right
> >> away in the disconnect() callback.
> >>
> >> This means that I don't need to know who's going away for my use cases.
> >>
> >> This turned out to be quite long winding, but the bottom line is that
> >> having a separate function to set the unplugged state seems to give a
> >> lot of flexibility for various use cases.
> >>
> >> I hope I didn't muddy the waters even more :-)
> >
> > Hm, I think I get your idea. Since I'm still not sure what the best option
> > is I'm leaning towards just leaving stuff as-is. I.e. drivers which want
> > to shut down hw can do the 1. drm_dev_unregister() 2.
> > drm_atomic_helper_shutdown() sequence. Drivers which care about hotunplug
> > more can do just the drm_dev_unplug().
> >
> > Yes it's messy and unsatisfactory, but in case of serious doubt I like to
> > wait until maybe in the future we have a good idea. Meanwhile leaving a
> > bit of a mess around is better than charging ahead in a possibly totally
> > wrong direction.
> >
> > There's cases where clue still hasn't hit me, even years later (or anyone
> > else). That's just how it is sometimes.
> >
>
> Ok, I'll drop this.
>
> This means that I'll drop devm_drm_dev_init() as well since it doesn't
> play well with drm_dev_unplug() since both will do drm_dev_put(). Not a
> big deal really, it just means that I need to add one error path in the
> probe function so I can drm_dev_put() on error.
Hm, I think we could just move the drm_dev_put out from drm_dev_unplug.
And then encourage everyone to use devm_drm_dev_init() everywhere. I do
like to get started with auto-cleaned up in drm somehow, and
devm_drm_dev_init doing the drm_dev_put() looks like a really good idea.
> Are you still ok with the first bug fix patch in this series?
Yeah that one still looks good.
Cheers, Daniel
>
> > Zooming out more looking at the big picture I'd say all your work in the
> > past few years has enormously simplified drm for simple drivers already.
> > If we can't resolve this one here right now that just means you "only"
> > made drm 98% simpler instead of maybe 99%. It's still an epic win :-)
> >
>
> Thanks, your mentoring and reviews helped turn my rough ideas into
> something useful :-)
>
> Noralf.
>
> > Cheers, Daniel
> >
> >
> >> Noralf.
> >>
> >>> Thanks, Daniel
> >>>
> >>>>
> >>>> Noralf.
> >>>>
> >>>>> Cheers, Daniel
> >>>>>
> >>>>>> Noralf.
> >>>>>>
> >>>>>>>> + * in order to disable the hardware on regular driver module unload.
> >>>>>>>> */
> >>>>>>>> void drm_dev_unregister(struct drm_device *dev)
> >>>>>>>> {
> >>>>>>>> @@ -845,6 +839,15 @@ void drm_dev_unregister(struct drm_device *dev)
> >>>>>>>> if (drm_core_check_feature(dev, DRIVER_LEGACY))
> >>>>>>>> drm_lastclose(dev);
> >>>>>>>>
> >>>>>>>> + /*
> >>>>>>>> + * After synchronizing any critical read section is guaranteed to see
> >>>>>>>> + * the new value of ->unplugged, and any critical section which might
> >>>>>>>> + * still have seen the old value of ->unplugged is guaranteed to have
> >>>>>>>> + * finished.
> >>>>>>>> + */
> >>>>>>>> + dev->unplugged = true;
> >>>>>>>> + synchronize_srcu(&drm_unplug_srcu);
> >>>>>>>> +
> >>>>>>>> dev->registered = false;
> >>>>>>>>
> >>>>>>>> drm_client_dev_unregister(dev);
> >>>>>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> >>>>>>>> index ca46a45a9cce..c50696c82a42 100644
> >>>>>>>> --- a/include/drm/drm_drv.h
> >>>>>>>> +++ b/include/drm/drm_drv.h
> >>>>>>>> @@ -736,13 +736,11 @@ void drm_dev_unplug(struct drm_device *dev);
> >>>>>>>> * drm_dev_is_unplugged - is a DRM device unplugged
> >>>>>>>> * @dev: DRM device
> >>>>>>>> *
> >>>>>>>> - * This function can be called to check whether a hotpluggable is unplugged.
> >>>>>>>> - * Unplugging itself is singalled through drm_dev_unplug(). If a device is
> >>>>>>>> - * unplugged, these two functions guarantee that any store before calling
> >>>>>>>> - * drm_dev_unplug() is visible to callers of this function after it completes
> >>>>>>>> + * This function can be called to check whether @dev is unregistered. This can
> >>>>>>>> + * be used to detect that the underlying parent device is gone.
> >>>>>>>
> >>>>>>> I think it'd be good to keep the first part, and just update the reference
> >>>>>>> to drm_dev_unregister. So:
> >>>>>>>
> >>>>>>> * This function can be called to check whether a hotpluggable is unplugged.
> >>>>>>> * Unplugging itself is singalled through drm_dev_unregister(). If a device is
> >>>>>>> * unplugged, these two functions guarantee that any store before calling
> >>>>>>> * drm_dev_unregister() is visible to callers of this function after it
> >>>>>>> * completes.
> >>>>>>>
> >>>>>>> I think your version shrugs a few important details under the rug. With
> >>>>>>> those nits addressed:
> >>>>>>>
> >>>>>>> Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> >>>>>>>
> >>>>>>> Cheers, Daniel
> >>>>>>>
> >>>>>>>> *
> >>>>>>>> - * WARNING: This function fundamentally races against drm_dev_unplug(). It is
> >>>>>>>> - * recommended that drivers instead use the underlying drm_dev_enter() and
> >>>>>>>> + * WARNING: This function fundamentally races against drm_dev_unregister(). It
> >>>>>>>> + * is recommended that drivers instead use the underlying drm_dev_enter() and
> >>>>>>>> * drm_dev_exit() function pairs.
> >>>>>>>> */
> >>>>>>>> static inline bool drm_dev_is_unplugged(struct drm_device *dev)
> >>>>>>>> --
> >>>>>>>> 2.20.1
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Intel-gfx mailing list
> >>>>>>>> Intel-gfx at lists.freedesktop.org
> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> >>>>>>>
> >>>>>
> >>>
> >
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the amd-gfx
mailing list