[Intel-gfx] [PATCH 2/6] drm/drv: Prepare to remove drm_dev_unplug()

Daniel Vetter daniel at ffwll.ch
Wed Feb 6 20:36:05 UTC 2019


On Wed, Feb 06, 2019 at 05:46:51PM +0100, Noralf Trønnes wrote:
> 
> 
> Den 06.02.2019 16.26, skrev Daniel Vetter:
> > On Tue, Feb 05, 2019 at 06:57:50PM +0100, Noralf Trønnes wrote:
> >>
> >>
> >> Den 05.02.2019 17.31, skrev Daniel Vetter:
> >>> On Tue, Feb 05, 2019 at 11:20:55AM +0100, Noralf Trønnes wrote:
> >>>>
> >>>>
> >>>> Den 05.02.2019 10.11, skrev Daniel Vetter:
> >>>>> On Mon, Feb 04, 2019 at 06:35:28PM +0100, Noralf Trønnes wrote:
> >>>>>>
> >>>>>>
> >>>>>> Den 04.02.2019 16.41, skrev Daniel Vetter:
> >>>>>>> On Sun, Feb 03, 2019 at 04:41:56PM +0100, Noralf Trønnes wrote:
> >>>>>>>> The only thing now that makes drm_dev_unplug() special is that it sets
> >>>>>>>> drm_device->unplugged. Move this code to drm_dev_unregister() so that we
> >>>>>>>> can remove drm_dev_unplug().
> >>>>>>>>
> >>>>>>>> Signed-off-by: Noralf Trønnes <noralf at tronnes.org>
> >>>>>>>> ---
> >>>>>>
> >>>>>> [...]
> >>>>>>
> >>>>>>>>  drivers/gpu/drm/drm_drv.c | 27 +++++++++++++++------------
> >>>>>>>>  include/drm/drm_drv.h     | 10 ++++------
> >>>>>>>>  2 files changed, 19 insertions(+), 18 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> >>>>>>>> index 05bbc2b622fc..e0941200edc6 100644
> >>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
> >>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
> >>>>>>>> @@ -366,15 +366,6 @@ EXPORT_SYMBOL(drm_dev_exit);
> >>>>>>>>   */
> >>>>>>>>  void drm_dev_unplug(struct drm_device *dev)
> >>>>>>>>  {
> >>>>>>>> -	/*
> >>>>>>>> -	 * After synchronizing any critical read section is guaranteed to see
> >>>>>>>> -	 * the new value of ->unplugged, and any critical section which might
> >>>>>>>> -	 * still have seen the old value of ->unplugged is guaranteed to have
> >>>>>>>> -	 * finished.
> >>>>>>>> -	 */
> >>>>>>>> -	dev->unplugged = true;
> >>>>>>>> -	synchronize_srcu(&drm_unplug_srcu);
> >>>>>>>> -
> >>>>>>>>  	drm_dev_unregister(dev);
> >>>>>>>>  	drm_dev_put(dev);
> >>>>>>>>  }
> >>>>>>>> @@ -832,11 +823,14 @@ EXPORT_SYMBOL(drm_dev_register);
> >>>>>>>>   * drm_dev_register() but does not deallocate the device. The caller must call
> >>>>>>>>   * drm_dev_put() to drop their final reference.
> >>>>>>>>   *
> >>>>>>>> - * A special form of unregistering for hotpluggable devices is drm_dev_unplug(),
> >>>>>>>> - * which can be called while there are still open users of @dev.
> >>>>>>>> + * This function can be called while there are still open users of @dev as long
> >>>>>>>> + * as the driver protects its device resources using drm_dev_enter() and
> >>>>>>>> + * drm_dev_exit().
> >>>>>>>>   *
> >>>>>>>>   * This should be called first in the device teardown code to make sure
> >>>>>>>> - * userspace can't access the device instance any more.
> >>>>>>>> + * userspace can't access the device instance any more. Drivers that support
> >>>>>>>> + * device unplug will probably want to call drm_atomic_helper_shutdown() first
> >>>>>>>
> >>>>>>> Read once more with a bit more coffee, spotted this:
> >>>>>>>
> >>>>>>> s/first/afterwards/ - shutting down the hw before we've taken it away from
> >>>>>>> userspace is kinda the wrong way round. It should be the inverse of driver
> >>>>>>> load, which is 1) allocate structures 2) prep hw 3) register driver with
> >>>>>>> the world (simplified ofc).
> >>>>>>>
> >>>>>>
> >>>>>> The problem is that drm_dev_unregister() sets the device as unplugged
> >>>>>> and if drm_atomic_helper_shutdown() is called afterwards it's not
> >>>>>> allowed to touch hardware.
> >>>>>>
> >>>>>> I know it's the wrong order, but the only way to do it in the right
> >>>>>> order is to have a separate function that sets unplugged:
> >>>>>>
> >>>>>> 	drm_dev_unregister();
> >>>>>> 	drm_atomic_helper_shutdown();
> >>>>>> 	drm_dev_set_unplugged();
> >>>>>
> >>>>> Annoying ... but yeah calling _shutdown() before we stopped userspace is
> >>>>> also not going to work. Because userspace could quickly re-enable
> >>>>> something, and then the refcounts would be all wrong again and leaking
> >>>>> objects.
> >>>>>
> >>>>
> >>>> What happens with a USB device that is unplugged with open userspace,
> >>>> will that leak objects?
> >>>
> >>> Maybe we've jumped to conclusions. drm_atomic_helper_shutdown() will run
> >>> as normal, the only thing that should be skipped is actually touching the
> >>> hw (as long as the driver doesn't protect too much with
> >>> drm_dev_enter/exit). So all the software updates (including refcounting
> >>> updates) will still be done. Ofc current udl is not yet atomic, so in
> >>> reality something else happens.
> >>>
> >>> And we ofc still have the same issue that if you just unload the driver,
> >>> then the hw will stay on (which might really confuse the driver on next
> >>> load, when it assumes that it only gets loaded from cold boot where
> >>> everything is off - which usually is the case on an arm soc at least).
> >>>
> >>>>> I get a bit the feeling we're over-optimizing here with trying to devm-ize
> >>>>> drm_dev_register. Just getting drm_device correctly devm-ized is a big
> >>>>> step forward already, and will open up a lot of TODO items across a lot of
> >>>>> drivers. E.g. we could add a drm_dev_kzalloc, for allocating all the drm_*
> >>>>> structs, which gets released together with drm_device. I think that's a
> >>>>> much clearer path forward, I think we all agree that getting the kfree out
> >>>>> of the driver codes is a good thing, and it would allow us to do this
> >>>>> correctly.
> >>>>>
> >>>>> Then once we have that and rolled out to a few drivers we can reconsider
> >>>>> the entire unregister/shutdown gordian knot here. Atm I just have no idea
> >>>>> how to do this properly :-/
> >>>>>
> >>>>> Thoughts, other ideas?
> >>>>>
> >>>>
> >>>> Yeah, I've come to the conclusion that devm_drm_dev_register() doesn't
> >>>> make much sense if we need a driver remove callback anyways.
> >>>
> >>> Yup, that's what I meant with the above:
> >>> - merge devm_drm_dev_register, use it a lot. This is definitely a good
> >>>   idea.
> >>> - create a drm_dev_kzalloc, which automatically releases memory on the
> >>>   final drm_dev_put. Use it every in drivers for drm structures,
> >>>   especially in those drivers that currently use devm (which releases
> >>>   those allocations potentialy too early on unplug).
> >>> - figure out the next steps after that
> >>>
> >>>> I think devm_drm_dev_init() makes sense because it yields a cleaner
> >>>> probe() function. An additional benefit is that it requires a
> >>>> drm_driver->release function which is a step in the right direction to
> >>>> get the drm_device lifetime right.
> >>>>
> >>>> Do we agree that a drm_dev_set_unplugged() function is necessary to get
> >>>> the remove/disconnect order right?
> >>>>
> >>>> What about drm_dev_unplug() maybe I should just leave it be?
> >>>>
> >>>> - amd uses drm_driver->unload, so that one takes some work to get right
> >>>>   to support unplug. It doesn't check the unplugged state, so really
> >>>>   doesn't need drm_dev_unplug() I guess. Do they have cards that can be
> >>>>   hotplugged?
> >>>
> >>> Yeah amd still uses ->load and ->unload, which is not great unfortunately.
> >>> I just stumbled over that when I tried to make a series to disable the
> >>> global drm_global_mutex for most drivers ...
> >>>
> >>>> - udl uses drm_driver->unload, doesn't use drm_atomic_helper_shutdown().
> >>>>   It has only one drm_dev_is_unplugged() check and that is in its
> >>>>   fbdev->open callback.
> >>>
> >>> udl isn't atomic, so can't use the atomic helpers. pre-atomic doesn't have
> >>> refcounting issues when something is left on iirc. I think udl is written
> >>> under the assumption that ->unload is always called for an unplug, never
> >>> for an actual unload.
> >>>
> >>>> - xen calls drm_atomic_helper_shutdown() in its drm_driver->release
> >>>>   callback which I guess is not correct.
> >>>
> >>> Yeah this smells fishy. ->release is supposed to be for cleaning up kernel
> >>> structures, not for cleaning up the hw. So maybe drm_mode_config_cleanup
> >>> could be put there, not sure honestly. But definitely not _shutdown().
> >>>
> >>>> This is what it will look like with a set unplugged function:
> >>>>
> >>>> void drm_dev_unplug(struct drm_device *dev)
> >>>> {
> >>>> 	drm_dev_set_unplugged(dev);
> >>>> 	drm_dev_unregister(dev);
> >>>> 	drm_dev_put(dev);
> >>>> }
> >>>>
> >>>> Hm, I should probably remove it to avoid further use of it since it's
> >>>> not correct for a modern driver.
> >>>
> >>> I think something flew over my head ... what's the drm_dev_set_unplugged
> >>> for? I think I'm not following you ...
> >>>
> >>
> >> Taking it a few steps back:
> >>
> >> This patch proposes to move 'dev->unplugged = true' from
> >> drm_dev_unplug() to drm_dev_unregister().
> >>
> >> Additionally I proposed this change to the drm_dev_unregister() docs:
> >>
> >>   * This should be called first in the device teardown code to make sure
> >> - * userspace can't access the device instance any more.
> >> + * userspace can't access the device instance any more. Drivers that
> >> support
> >> + * device unplug will probably want to call
> >> drm_atomic_helper_shutdown() first
> >> + * in order to disable the hardware on regular driver module unload.
> >>
> >> Which would give a driver remove callback like this:
> >>
> >> static int driver_remove()
> >> {
> >> 	drm_atomic_helper_shutdown();
> >> 	drm_dev_unregister();
> >> }
> >>
> >> Your reaction was that drm_atomic_helper_shutdown() needs to be called
> >> after drm_dev_unregister() to avoid a race resulting in leaked objects.
> >> However if we call it afterwards, ->unplugged will be true and the
> >> driver can't touch hardware.
> >>
> >> Then I proposed moving the unplugged state setting to:
> >>
> >> void drm_dev_set_unplugged(struct drm_device *dev)
> >> {
> >> 	dev->unplugged = true;
> >> 	synchronize_srcu(&drm_unplug_srcu);
> >> }
> >>
> >> Now it is possible to avoid the race and still touch hardware:
> >>
> >> static int driver_remove()
> >> {
> >> 	drm_dev_unregister();
> >> 	drm_atomic_helper_shutdown();
> >> 	drm_dev_set_unplugged();
> >> }
> >>
> >> But now I'm back to the question: Is it the driver or the device that is
> >> going away?
> >>
> >> If it's the driver we are fine touching hardware, if it's the device it
> >> depends on how we access the device resource and whether the subsystem
> >> has protection in place to handle access after the device is gone. I
> >> think USB can handle and block device access up until the disconnect
> >> callback has finished (no point in doing so though, since the normal
> >> operation is that the device is gone, not the driver unloading).
> >>
> >> Is there a way to determine who's going away without changes to the
> >> device core?
> >>
> >> Maybe. The driver can only be unregistered if there are no open file
> >> handles because a ref is taken on the driver module.
> > 
> > This isn't true. You can (not many people do, but it's possible) to unbind
> > through sysfs. The module reference only keeps the cpu instructions valid,
> > nothing else.
> > 
> >> So maybe something along these lines:
> >>
> >> int drm_dev_open_count(struct drm_device *dev)
> >> {
> >> 	int open_count;
> >>
> >> 	mutex_lock(&drm_global_mutex);
> >> 	open_count = dev->open_count;
> >> 	mutex_unlock(&drm_global_mutex);
> > 
> > Random style nit: READ_ONCE, no locking needed (the locks don't change
> > anything, except if you have really strange locking rules). Serves
> > double-duty as a huge warning sign that something tricky is happening.
> > 
> >> 	return open_count;
> >> }
> >>
> >> static int driver_remove()
> >> {
> >> 	drm_dev_unregister();
> >>
> >> 	open_count = drm_dev_open_count();
> >>
> >> 	/* Open fd's, device is going away, block access */
> >> 	if (open_count)
> >> 		drm_dev_set_unplugged();
> >>
> >> 	drm_atomic_helper_shutdown();
> >>
> >> 	/* No open fd's, driver is going away */
> >> 	if (!open_count)
> >> 		drm_dev_set_unplugged();
> >> }
> >>
> >>
> >> Personally I have 2 use cases:
> >> - tinydrm SPI drivers
> >>   The only hotpluggable SPI controllers I know of are USB which should
> >>   handle device access while unregistering.
> >>   SPI devices can be removed, but the controller driver is still in
> >>   place so it's safe.
> >> - A future USB driver (that I hope to work on when all this core stuff
> >>   is in place).
> >>   There's no point in touching the hw, so DRM can be set uplugged right
> >>   away in the disconnect() callback.
> >>
> >> This means that I don't need to know who's going away for my use cases.
> >>
> >> This turned out to be quite long winding, but the bottom line is that
> >> having a separate function to set the unplugged state seems to give a
> >> lot of flexibility for various use cases.
> >>
> >> I hope I didn't muddy the waters even more :-)
> > 
> > Hm, I think I get your idea. Since I'm still not sure what the best option
> > is I'm leaning towards just leaving stuff as-is. I.e. drivers which want
> > to shut down hw can do the 1. drm_dev_unregister() 2.
> > drm_atomic_helper_shutdown() sequence. Drivers which care about hotunplug
> > more can do just the drm_dev_unplug().
> > 
> > Yes it's messy and unsatisfactory, but in case of serious doubt I like to
> > wait until maybe in the future we have a good idea. Meanwhile leaving a
> > bit of a mess around is better than charging ahead in a possibly totally
> > wrong direction.
> > 
> > There's cases where clue still hasn't hit me, even years later (or anyone
> > else). That's just how it is sometimes.
> > 
> 
> Ok, I'll drop this.
> 
> This means that I'll drop devm_drm_dev_init() as well since it doesn't
> play well with drm_dev_unplug() since both will do drm_dev_put(). Not a
> big deal really, it just means that I need to add one error path in the
> probe function so I can drm_dev_put() on error.

Hm, I think we could just move the drm_dev_put out from drm_dev_unplug.
And then encourage everyone to use devm_drm_dev_init() everywhere. I do
like to get started with auto-cleaned up in drm somehow, and
devm_drm_dev_init doing the drm_dev_put() looks like a really good idea.

> Are you still ok with the first bug fix patch in this series?

Yeah that one still looks good.

Cheers, Daniel

> 
> > Zooming out more looking at the big picture I'd say all your work in the
> > past few years has enormously simplified drm for simple drivers already.
> > If we can't resolve this one here right now that just means you "only"
> > made drm 98% simpler instead of maybe 99%. It's still an epic win :-)
> > 
> 
> Thanks, your mentoring and reviews helped turn my rough ideas into
> something useful :-)
> 
> Noralf.
> 
> > Cheers, Daniel
> > 
> > 
> >> Noralf.
> >>
> >>> Thanks, Daniel
> >>>
> >>>>
> >>>> Noralf.
> >>>>
> >>>>> Cheers, Daniel
> >>>>>
> >>>>>> Noralf.
> >>>>>>
> >>>>>>>> + * in order to disable the hardware on regular driver module unload.
> >>>>>>>>   */
> >>>>>>>>  void drm_dev_unregister(struct drm_device *dev)
> >>>>>>>>  {
> >>>>>>>> @@ -845,6 +839,15 @@ void drm_dev_unregister(struct drm_device *dev)
> >>>>>>>>  	if (drm_core_check_feature(dev, DRIVER_LEGACY))
> >>>>>>>>  		drm_lastclose(dev);
> >>>>>>>>  
> >>>>>>>> +	/*
> >>>>>>>> +	 * After synchronizing any critical read section is guaranteed to see
> >>>>>>>> +	 * the new value of ->unplugged, and any critical section which might
> >>>>>>>> +	 * still have seen the old value of ->unplugged is guaranteed to have
> >>>>>>>> +	 * finished.
> >>>>>>>> +	 */
> >>>>>>>> +	dev->unplugged = true;
> >>>>>>>> +	synchronize_srcu(&drm_unplug_srcu);
> >>>>>>>> +
> >>>>>>>>  	dev->registered = false;
> >>>>>>>>  
> >>>>>>>>  	drm_client_dev_unregister(dev);
> >>>>>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> >>>>>>>> index ca46a45a9cce..c50696c82a42 100644
> >>>>>>>> --- a/include/drm/drm_drv.h
> >>>>>>>> +++ b/include/drm/drm_drv.h
> >>>>>>>> @@ -736,13 +736,11 @@ void drm_dev_unplug(struct drm_device *dev);
> >>>>>>>>   * drm_dev_is_unplugged - is a DRM device unplugged
> >>>>>>>>   * @dev: DRM device
> >>>>>>>>   *
> >>>>>>>> - * This function can be called to check whether a hotpluggable is unplugged.
> >>>>>>>> - * Unplugging itself is singalled through drm_dev_unplug(). If a device is
> >>>>>>>> - * unplugged, these two functions guarantee that any store before calling
> >>>>>>>> - * drm_dev_unplug() is visible to callers of this function after it completes
> >>>>>>>> + * This function can be called to check whether @dev is unregistered. This can
> >>>>>>>> + * be used to detect that the underlying parent device is gone.
> >>>>>>>
> >>>>>>> I think it'd be good to keep the first part, and just update the reference
> >>>>>>> to drm_dev_unregister. So:
> >>>>>>>
> >>>>>>>  * This function can be called to check whether a hotpluggable is unplugged.
> >>>>>>>  * Unplugging itself is singalled through drm_dev_unregister(). If a device is
> >>>>>>>  * unplugged, these two functions guarantee that any store before calling
> >>>>>>>  * drm_dev_unregister() is visible to callers of this function after it
> >>>>>>>  * completes.
> >>>>>>>
> >>>>>>> I think your version shrugs a few important details under the rug. With
> >>>>>>> those nits addressed:
> >>>>>>>
> >>>>>>> Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> >>>>>>>
> >>>>>>> Cheers, Daniel
> >>>>>>>
> >>>>>>>>   *
> >>>>>>>> - * WARNING: This function fundamentally races against drm_dev_unplug(). It is
> >>>>>>>> - * recommended that drivers instead use the underlying drm_dev_enter() and
> >>>>>>>> + * WARNING: This function fundamentally races against drm_dev_unregister(). It
> >>>>>>>> + * is recommended that drivers instead use the underlying drm_dev_enter() and
> >>>>>>>>   * drm_dev_exit() function pairs.
> >>>>>>>>   */
> >>>>>>>>  static inline bool drm_dev_is_unplugged(struct drm_device *dev)
> >>>>>>>> -- 
> >>>>>>>> 2.20.1
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Intel-gfx mailing list
> >>>>>>>> Intel-gfx at lists.freedesktop.org
> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> >>>>>>>
> >>>>>
> >>>
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the amd-gfx mailing list