[Intel-gfx] [PATCH 2/6] drm/drv: Prepare to remove drm_dev_unplug()
Noralf Trønnes
noralf at tronnes.org
Wed Feb 6 16:46:51 UTC 2019
Den 06.02.2019 16.26, skrev Daniel Vetter:
> On Tue, Feb 05, 2019 at 06:57:50PM +0100, Noralf Trønnes wrote:
>>
>>
>> Den 05.02.2019 17.31, skrev Daniel Vetter:
>>> On Tue, Feb 05, 2019 at 11:20:55AM +0100, Noralf Trønnes wrote:
>>>>
>>>>
>>>> Den 05.02.2019 10.11, skrev Daniel Vetter:
>>>>> On Mon, Feb 04, 2019 at 06:35:28PM +0100, Noralf Trønnes wrote:
>>>>>>
>>>>>>
>>>>>> Den 04.02.2019 16.41, skrev Daniel Vetter:
>>>>>>> On Sun, Feb 03, 2019 at 04:41:56PM +0100, Noralf Trønnes wrote:
>>>>>>>> The only thing now that makes drm_dev_unplug() special is that it sets
>>>>>>>> drm_device->unplugged. Move this code to drm_dev_unregister() so that we
>>>>>>>> can remove drm_dev_unplug().
>>>>>>>>
>>>>>>>> Signed-off-by: Noralf Trønnes <noralf at tronnes.org>
>>>>>>>> ---
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>>> drivers/gpu/drm/drm_drv.c | 27 +++++++++++++++------------
>>>>>>>> include/drm/drm_drv.h | 10 ++++------
>>>>>>>> 2 files changed, 19 insertions(+), 18 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>>>>>>>> index 05bbc2b622fc..e0941200edc6 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_drv.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
>>>>>>>> @@ -366,15 +366,6 @@ EXPORT_SYMBOL(drm_dev_exit);
>>>>>>>> */
>>>>>>>> void drm_dev_unplug(struct drm_device *dev)
>>>>>>>> {
>>>>>>>> - /*
>>>>>>>> - * After synchronizing any critical read section is guaranteed to see
>>>>>>>> - * the new value of ->unplugged, and any critical section which might
>>>>>>>> - * still have seen the old value of ->unplugged is guaranteed to have
>>>>>>>> - * finished.
>>>>>>>> - */
>>>>>>>> - dev->unplugged = true;
>>>>>>>> - synchronize_srcu(&drm_unplug_srcu);
>>>>>>>> -
>>>>>>>> drm_dev_unregister(dev);
>>>>>>>> drm_dev_put(dev);
>>>>>>>> }
>>>>>>>> @@ -832,11 +823,14 @@ EXPORT_SYMBOL(drm_dev_register);
>>>>>>>> * drm_dev_register() but does not deallocate the device. The caller must call
>>>>>>>> * drm_dev_put() to drop their final reference.
>>>>>>>> *
>>>>>>>> - * A special form of unregistering for hotpluggable devices is drm_dev_unplug(),
>>>>>>>> - * which can be called while there are still open users of @dev.
>>>>>>>> + * This function can be called while there are still open users of @dev as long
>>>>>>>> + * as the driver protects its device resources using drm_dev_enter() and
>>>>>>>> + * drm_dev_exit().
>>>>>>>> *
>>>>>>>> * This should be called first in the device teardown code to make sure
>>>>>>>> - * userspace can't access the device instance any more.
>>>>>>>> + * userspace can't access the device instance any more. Drivers that support
>>>>>>>> + * device unplug will probably want to call drm_atomic_helper_shutdown() first
>>>>>>>
>>>>>>> Read once more with a bit more coffee, spotted this:
>>>>>>>
>>>>>>> s/first/afterwards/ - shutting down the hw before we've taken it away from
>>>>>>> userspace is kinda the wrong way round. It should be the inverse of driver
>>>>>>> load, which is 1) allocate structures 2) prep hw 3) register driver with
>>>>>>> the world (simplified ofc).
>>>>>>>
>>>>>>
>>>>>> The problem is that drm_dev_unregister() sets the device as unplugged
>>>>>> and if drm_atomic_helper_shutdown() is called afterwards it's not
>>>>>> allowed to touch hardware.
>>>>>>
>>>>>> I know it's the wrong order, but the only way to do it in the right
>>>>>> order is to have a separate function that sets unplugged:
>>>>>>
>>>>>> drm_dev_unregister();
>>>>>> drm_atomic_helper_shutdown();
>>>>>> drm_dev_set_unplugged();
>>>>>
>>>>> Annoying ... but yeah calling _shutdown() before we stopped userspace is
>>>>> also not going to work. Because userspace could quickly re-enable
>>>>> something, and then the refcounts would be all wrong again and leaking
>>>>> objects.
>>>>>
>>>>
>>>> What happens with a USB device that is unplugged with open userspace,
>>>> will that leak objects?
>>>
>>> Maybe we've jumped to conclusions. drm_atomic_helper_shutdown() will run
>>> as normal, the only thing that should be skipped is actually touching the
>>> hw (as long as the driver doesn't protect too much with
>>> drm_dev_enter/exit). So all the software updates (including refcounting
>>> updates) will still be done. Ofc current udl is not yet atomic, so in
>>> reality something else happens.
>>>
>>> And we ofc still have the same issue that if you just unload the driver,
>>> then the hw will stay on (which might really confuse the driver on next
>>> load, when it assumes that it only gets loaded from cold boot where
>>> everything is off - which usually is the case on an arm soc at least).
>>>
>>>>> I get a bit the feeling we're over-optimizing here with trying to devm-ize
>>>>> drm_dev_register. Just getting drm_device correctly devm-ized is a big
>>>>> step forward already, and will open up a lot of TODO items across a lot of
>>>>> drivers. E.g. we could add a drm_dev_kzalloc, for allocating all the drm_*
>>>>> structs, which gets released together with drm_device. I think that's a
>>>>> much clearer path forward, I think we all agree that getting the kfree out
>>>>> of the driver codes is a good thing, and it would allow us to do this
>>>>> correctly.
>>>>>
>>>>> Then once we have that and rolled out to a few drivers we can reconsider
>>>>> the entire unregister/shutdown gordian knot here. Atm I just have no idea
>>>>> how to do this properly :-/
>>>>>
>>>>> Thoughts, other ideas?
>>>>>
>>>>
>>>> Yeah, I've come to the conclusion that devm_drm_dev_register() doesn't
>>>> make much sense if we need a driver remove callback anyways.
>>>
>>> Yup, that's what I meant with the above:
>>> - merge devm_drm_dev_register, use it a lot. This is definitely a good
>>> idea.
>>> - create a drm_dev_kzalloc, which automatically releases memory on the
>>> final drm_dev_put. Use it every in drivers for drm structures,
>>> especially in those drivers that currently use devm (which releases
>>> those allocations potentialy too early on unplug).
>>> - figure out the next steps after that
>>>
>>>> I think devm_drm_dev_init() makes sense because it yields a cleaner
>>>> probe() function. An additional benefit is that it requires a
>>>> drm_driver->release function which is a step in the right direction to
>>>> get the drm_device lifetime right.
>>>>
>>>> Do we agree that a drm_dev_set_unplugged() function is necessary to get
>>>> the remove/disconnect order right?
>>>>
>>>> What about drm_dev_unplug() maybe I should just leave it be?
>>>>
>>>> - amd uses drm_driver->unload, so that one takes some work to get right
>>>> to support unplug. It doesn't check the unplugged state, so really
>>>> doesn't need drm_dev_unplug() I guess. Do they have cards that can be
>>>> hotplugged?
>>>
>>> Yeah amd still uses ->load and ->unload, which is not great unfortunately.
>>> I just stumbled over that when I tried to make a series to disable the
>>> global drm_global_mutex for most drivers ...
>>>
>>>> - udl uses drm_driver->unload, doesn't use drm_atomic_helper_shutdown().
>>>> It has only one drm_dev_is_unplugged() check and that is in its
>>>> fbdev->open callback.
>>>
>>> udl isn't atomic, so can't use the atomic helpers. pre-atomic doesn't have
>>> refcounting issues when something is left on iirc. I think udl is written
>>> under the assumption that ->unload is always called for an unplug, never
>>> for an actual unload.
>>>
>>>> - xen calls drm_atomic_helper_shutdown() in its drm_driver->release
>>>> callback which I guess is not correct.
>>>
>>> Yeah this smells fishy. ->release is supposed to be for cleaning up kernel
>>> structures, not for cleaning up the hw. So maybe drm_mode_config_cleanup
>>> could be put there, not sure honestly. But definitely not _shutdown().
>>>
>>>> This is what it will look like with a set unplugged function:
>>>>
>>>> void drm_dev_unplug(struct drm_device *dev)
>>>> {
>>>> drm_dev_set_unplugged(dev);
>>>> drm_dev_unregister(dev);
>>>> drm_dev_put(dev);
>>>> }
>>>>
>>>> Hm, I should probably remove it to avoid further use of it since it's
>>>> not correct for a modern driver.
>>>
>>> I think something flew over my head ... what's the drm_dev_set_unplugged
>>> for? I think I'm not following you ...
>>>
>>
>> Taking it a few steps back:
>>
>> This patch proposes to move 'dev->unplugged = true' from
>> drm_dev_unplug() to drm_dev_unregister().
>>
>> Additionally I proposed this change to the drm_dev_unregister() docs:
>>
>> * This should be called first in the device teardown code to make sure
>> - * userspace can't access the device instance any more.
>> + * userspace can't access the device instance any more. Drivers that
>> support
>> + * device unplug will probably want to call
>> drm_atomic_helper_shutdown() first
>> + * in order to disable the hardware on regular driver module unload.
>>
>> Which would give a driver remove callback like this:
>>
>> static int driver_remove()
>> {
>> drm_atomic_helper_shutdown();
>> drm_dev_unregister();
>> }
>>
>> Your reaction was that drm_atomic_helper_shutdown() needs to be called
>> after drm_dev_unregister() to avoid a race resulting in leaked objects.
>> However if we call it afterwards, ->unplugged will be true and the
>> driver can't touch hardware.
>>
>> Then I proposed moving the unplugged state setting to:
>>
>> void drm_dev_set_unplugged(struct drm_device *dev)
>> {
>> dev->unplugged = true;
>> synchronize_srcu(&drm_unplug_srcu);
>> }
>>
>> Now it is possible to avoid the race and still touch hardware:
>>
>> static int driver_remove()
>> {
>> drm_dev_unregister();
>> drm_atomic_helper_shutdown();
>> drm_dev_set_unplugged();
>> }
>>
>> But now I'm back to the question: Is it the driver or the device that is
>> going away?
>>
>> If it's the driver we are fine touching hardware, if it's the device it
>> depends on how we access the device resource and whether the subsystem
>> has protection in place to handle access after the device is gone. I
>> think USB can handle and block device access up until the disconnect
>> callback has finished (no point in doing so though, since the normal
>> operation is that the device is gone, not the driver unloading).
>>
>> Is there a way to determine who's going away without changes to the
>> device core?
>>
>> Maybe. The driver can only be unregistered if there are no open file
>> handles because a ref is taken on the driver module.
>
> This isn't true. You can (not many people do, but it's possible) to unbind
> through sysfs. The module reference only keeps the cpu instructions valid,
> nothing else.
>
>> So maybe something along these lines:
>>
>> int drm_dev_open_count(struct drm_device *dev)
>> {
>> int open_count;
>>
>> mutex_lock(&drm_global_mutex);
>> open_count = dev->open_count;
>> mutex_unlock(&drm_global_mutex);
>
> Random style nit: READ_ONCE, no locking needed (the locks don't change
> anything, except if you have really strange locking rules). Serves
> double-duty as a huge warning sign that something tricky is happening.
>
>> return open_count;
>> }
>>
>> static int driver_remove()
>> {
>> drm_dev_unregister();
>>
>> open_count = drm_dev_open_count();
>>
>> /* Open fd's, device is going away, block access */
>> if (open_count)
>> drm_dev_set_unplugged();
>>
>> drm_atomic_helper_shutdown();
>>
>> /* No open fd's, driver is going away */
>> if (!open_count)
>> drm_dev_set_unplugged();
>> }
>>
>>
>> Personally I have 2 use cases:
>> - tinydrm SPI drivers
>> The only hotpluggable SPI controllers I know of are USB which should
>> handle device access while unregistering.
>> SPI devices can be removed, but the controller driver is still in
>> place so it's safe.
>> - A future USB driver (that I hope to work on when all this core stuff
>> is in place).
>> There's no point in touching the hw, so DRM can be set uplugged right
>> away in the disconnect() callback.
>>
>> This means that I don't need to know who's going away for my use cases.
>>
>> This turned out to be quite long winding, but the bottom line is that
>> having a separate function to set the unplugged state seems to give a
>> lot of flexibility for various use cases.
>>
>> I hope I didn't muddy the waters even more :-)
>
> Hm, I think I get your idea. Since I'm still not sure what the best option
> is I'm leaning towards just leaving stuff as-is. I.e. drivers which want
> to shut down hw can do the 1. drm_dev_unregister() 2.
> drm_atomic_helper_shutdown() sequence. Drivers which care about hotunplug
> more can do just the drm_dev_unplug().
>
> Yes it's messy and unsatisfactory, but in case of serious doubt I like to
> wait until maybe in the future we have a good idea. Meanwhile leaving a
> bit of a mess around is better than charging ahead in a possibly totally
> wrong direction.
>
> There's cases where clue still hasn't hit me, even years later (or anyone
> else). That's just how it is sometimes.
>
Ok, I'll drop this.
This means that I'll drop devm_drm_dev_init() as well since it doesn't
play well with drm_dev_unplug() since both will do drm_dev_put(). Not a
big deal really, it just means that I need to add one error path in the
probe function so I can drm_dev_put() on error.
Are you still ok with the first bug fix patch in this series?
> Zooming out more looking at the big picture I'd say all your work in the
> past few years has enormously simplified drm for simple drivers already.
> If we can't resolve this one here right now that just means you "only"
> made drm 98% simpler instead of maybe 99%. It's still an epic win :-)
>
Thanks, your mentoring and reviews helped turn my rough ideas into
something useful :-)
Noralf.
> Cheers, Daniel
>
>
>> Noralf.
>>
>>> Thanks, Daniel
>>>
>>>>
>>>> Noralf.
>>>>
>>>>> Cheers, Daniel
>>>>>
>>>>>> Noralf.
>>>>>>
>>>>>>>> + * in order to disable the hardware on regular driver module unload.
>>>>>>>> */
>>>>>>>> void drm_dev_unregister(struct drm_device *dev)
>>>>>>>> {
>>>>>>>> @@ -845,6 +839,15 @@ void drm_dev_unregister(struct drm_device *dev)
>>>>>>>> if (drm_core_check_feature(dev, DRIVER_LEGACY))
>>>>>>>> drm_lastclose(dev);
>>>>>>>>
>>>>>>>> + /*
>>>>>>>> + * After synchronizing any critical read section is guaranteed to see
>>>>>>>> + * the new value of ->unplugged, and any critical section which might
>>>>>>>> + * still have seen the old value of ->unplugged is guaranteed to have
>>>>>>>> + * finished.
>>>>>>>> + */
>>>>>>>> + dev->unplugged = true;
>>>>>>>> + synchronize_srcu(&drm_unplug_srcu);
>>>>>>>> +
>>>>>>>> dev->registered = false;
>>>>>>>>
>>>>>>>> drm_client_dev_unregister(dev);
>>>>>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>>>>>>>> index ca46a45a9cce..c50696c82a42 100644
>>>>>>>> --- a/include/drm/drm_drv.h
>>>>>>>> +++ b/include/drm/drm_drv.h
>>>>>>>> @@ -736,13 +736,11 @@ void drm_dev_unplug(struct drm_device *dev);
>>>>>>>> * drm_dev_is_unplugged - is a DRM device unplugged
>>>>>>>> * @dev: DRM device
>>>>>>>> *
>>>>>>>> - * This function can be called to check whether a hotpluggable is unplugged.
>>>>>>>> - * Unplugging itself is singalled through drm_dev_unplug(). If a device is
>>>>>>>> - * unplugged, these two functions guarantee that any store before calling
>>>>>>>> - * drm_dev_unplug() is visible to callers of this function after it completes
>>>>>>>> + * This function can be called to check whether @dev is unregistered. This can
>>>>>>>> + * be used to detect that the underlying parent device is gone.
>>>>>>>
>>>>>>> I think it'd be good to keep the first part, and just update the reference
>>>>>>> to drm_dev_unregister. So:
>>>>>>>
>>>>>>> * This function can be called to check whether a hotpluggable is unplugged.
>>>>>>> * Unplugging itself is singalled through drm_dev_unregister(). If a device is
>>>>>>> * unplugged, these two functions guarantee that any store before calling
>>>>>>> * drm_dev_unregister() is visible to callers of this function after it
>>>>>>> * completes.
>>>>>>>
>>>>>>> I think your version shrugs a few important details under the rug. With
>>>>>>> those nits addressed:
>>>>>>>
>>>>>>> Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
>>>>>>>
>>>>>>> Cheers, Daniel
>>>>>>>
>>>>>>>> *
>>>>>>>> - * WARNING: This function fundamentally races against drm_dev_unplug(). It is
>>>>>>>> - * recommended that drivers instead use the underlying drm_dev_enter() and
>>>>>>>> + * WARNING: This function fundamentally races against drm_dev_unregister(). It
>>>>>>>> + * is recommended that drivers instead use the underlying drm_dev_enter() and
>>>>>>>> * drm_dev_exit() function pairs.
>>>>>>>> */
>>>>>>>> static inline bool drm_dev_is_unplugged(struct drm_device *dev)
>>>>>>>> --
>>>>>>>> 2.20.1
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Intel-gfx mailing list
>>>>>>>> Intel-gfx at lists.freedesktop.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>>>>>
>>>>>
>>>
>
More information about the amd-gfx
mailing list