[Intel-gfx] [PATCH 2/6] drm/drv: Prepare to remove drm_dev_unplug()
Noralf Trønnes
noralf at tronnes.org
Tue Feb 5 17:57:50 UTC 2019
Den 05.02.2019 17.31, skrev Daniel Vetter:
> On Tue, Feb 05, 2019 at 11:20:55AM +0100, Noralf Trønnes wrote:
>>
>>
>> Den 05.02.2019 10.11, skrev Daniel Vetter:
>>> On Mon, Feb 04, 2019 at 06:35:28PM +0100, Noralf Trønnes wrote:
>>>>
>>>>
>>>> Den 04.02.2019 16.41, skrev Daniel Vetter:
>>>>> On Sun, Feb 03, 2019 at 04:41:56PM +0100, Noralf Trønnes wrote:
>>>>>> The only thing now that makes drm_dev_unplug() special is that it sets
>>>>>> drm_device->unplugged. Move this code to drm_dev_unregister() so that we
>>>>>> can remove drm_dev_unplug().
>>>>>>
>>>>>> Signed-off-by: Noralf Trønnes <noralf at tronnes.org>
>>>>>> ---
>>>>
>>>> [...]
>>>>
>>>>>> drivers/gpu/drm/drm_drv.c | 27 +++++++++++++++------------
>>>>>> include/drm/drm_drv.h | 10 ++++------
>>>>>> 2 files changed, 19 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>>>>>> index 05bbc2b622fc..e0941200edc6 100644
>>>>>> --- a/drivers/gpu/drm/drm_drv.c
>>>>>> +++ b/drivers/gpu/drm/drm_drv.c
>>>>>> @@ -366,15 +366,6 @@ EXPORT_SYMBOL(drm_dev_exit);
>>>>>> */
>>>>>> void drm_dev_unplug(struct drm_device *dev)
>>>>>> {
>>>>>> - /*
>>>>>> - * After synchronizing any critical read section is guaranteed to see
>>>>>> - * the new value of ->unplugged, and any critical section which might
>>>>>> - * still have seen the old value of ->unplugged is guaranteed to have
>>>>>> - * finished.
>>>>>> - */
>>>>>> - dev->unplugged = true;
>>>>>> - synchronize_srcu(&drm_unplug_srcu);
>>>>>> -
>>>>>> drm_dev_unregister(dev);
>>>>>> drm_dev_put(dev);
>>>>>> }
>>>>>> @@ -832,11 +823,14 @@ EXPORT_SYMBOL(drm_dev_register);
>>>>>> * drm_dev_register() but does not deallocate the device. The caller must call
>>>>>> * drm_dev_put() to drop their final reference.
>>>>>> *
>>>>>> - * A special form of unregistering for hotpluggable devices is drm_dev_unplug(),
>>>>>> - * which can be called while there are still open users of @dev.
>>>>>> + * This function can be called while there are still open users of @dev as long
>>>>>> + * as the driver protects its device resources using drm_dev_enter() and
>>>>>> + * drm_dev_exit().
>>>>>> *
>>>>>> * This should be called first in the device teardown code to make sure
>>>>>> - * userspace can't access the device instance any more.
>>>>>> + * userspace can't access the device instance any more. Drivers that support
>>>>>> + * device unplug will probably want to call drm_atomic_helper_shutdown() first
>>>>>
>>>>> Read once more with a bit more coffee, spotted this:
>>>>>
>>>>> s/first/afterwards/ - shutting down the hw before we've taken it away from
>>>>> userspace is kinda the wrong way round. It should be the inverse of driver
>>>>> load, which is 1) allocate structures 2) prep hw 3) register driver with
>>>>> the world (simplified ofc).
>>>>>
>>>>
>>>> The problem is that drm_dev_unregister() sets the device as unplugged
>>>> and if drm_atomic_helper_shutdown() is called afterwards it's not
>>>> allowed to touch hardware.
>>>>
>>>> I know it's the wrong order, but the only way to do it in the right
>>>> order is to have a separate function that sets unplugged:
>>>>
>>>> drm_dev_unregister();
>>>> drm_atomic_helper_shutdown();
>>>> drm_dev_set_unplugged();
>>>
>>> Annoying ... but yeah calling _shutdown() before we stopped userspace is
>>> also not going to work. Because userspace could quickly re-enable
>>> something, and then the refcounts would be all wrong again and leaking
>>> objects.
>>>
>>
>> What happens with a USB device that is unplugged with open userspace,
>> will that leak objects?
>
> Maybe we've jumped to conclusions. drm_atomic_helper_shutdown() will run
> as normal, the only thing that should be skipped is actually touching the
> hw (as long as the driver doesn't protect too much with
> drm_dev_enter/exit). So all the software updates (including refcounting
> updates) will still be done. Ofc current udl is not yet atomic, so in
> reality something else happens.
>
> And we ofc still have the same issue that if you just unload the driver,
> then the hw will stay on (which might really confuse the driver on next
> load, when it assumes that it only gets loaded from cold boot where
> everything is off - which usually is the case on an arm soc at least).
>
>>> I get a bit the feeling we're over-optimizing here with trying to devm-ize
>>> drm_dev_register. Just getting drm_device correctly devm-ized is a big
>>> step forward already, and will open up a lot of TODO items across a lot of
>>> drivers. E.g. we could add a drm_dev_kzalloc, for allocating all the drm_*
>>> structs, which gets released together with drm_device. I think that's a
>>> much clearer path forward, I think we all agree that getting the kfree out
>>> of the driver codes is a good thing, and it would allow us to do this
>>> correctly.
>>>
>>> Then once we have that and rolled out to a few drivers we can reconsider
>>> the entire unregister/shutdown gordian knot here. Atm I just have no idea
>>> how to do this properly :-/
>>>
>>> Thoughts, other ideas?
>>>
>>
>> Yeah, I've come to the conclusion that devm_drm_dev_register() doesn't
>> make much sense if we need a driver remove callback anyways.
>
> Yup, that's what I meant with the above:
> - merge devm_drm_dev_register, use it a lot. This is definitely a good
> idea.
> - create a drm_dev_kzalloc, which automatically releases memory on the
> final drm_dev_put. Use it every in drivers for drm structures,
> especially in those drivers that currently use devm (which releases
> those allocations potentialy too early on unplug).
> - figure out the next steps after that
>
>> I think devm_drm_dev_init() makes sense because it yields a cleaner
>> probe() function. An additional benefit is that it requires a
>> drm_driver->release function which is a step in the right direction to
>> get the drm_device lifetime right.
>>
>> Do we agree that a drm_dev_set_unplugged() function is necessary to get
>> the remove/disconnect order right?
>>
>> What about drm_dev_unplug() maybe I should just leave it be?
>>
>> - amd uses drm_driver->unload, so that one takes some work to get right
>> to support unplug. It doesn't check the unplugged state, so really
>> doesn't need drm_dev_unplug() I guess. Do they have cards that can be
>> hotplugged?
>
> Yeah amd still uses ->load and ->unload, which is not great unfortunately.
> I just stumbled over that when I tried to make a series to disable the
> global drm_global_mutex for most drivers ...
>
>> - udl uses drm_driver->unload, doesn't use drm_atomic_helper_shutdown().
>> It has only one drm_dev_is_unplugged() check and that is in its
>> fbdev->open callback.
>
> udl isn't atomic, so can't use the atomic helpers. pre-atomic doesn't have
> refcounting issues when something is left on iirc. I think udl is written
> under the assumption that ->unload is always called for an unplug, never
> for an actual unload.
>
>> - xen calls drm_atomic_helper_shutdown() in its drm_driver->release
>> callback which I guess is not correct.
>
> Yeah this smells fishy. ->release is supposed to be for cleaning up kernel
> structures, not for cleaning up the hw. So maybe drm_mode_config_cleanup
> could be put there, not sure honestly. But definitely not _shutdown().
>
>> This is what it will look like with a set unplugged function:
>>
>> void drm_dev_unplug(struct drm_device *dev)
>> {
>> drm_dev_set_unplugged(dev);
>> drm_dev_unregister(dev);
>> drm_dev_put(dev);
>> }
>>
>> Hm, I should probably remove it to avoid further use of it since it's
>> not correct for a modern driver.
>
> I think something flew over my head ... what's the drm_dev_set_unplugged
> for? I think I'm not following you ...
>
Taking it a few steps back:
This patch proposes to move 'dev->unplugged = true' from
drm_dev_unplug() to drm_dev_unregister().
Additionally I proposed this change to the drm_dev_unregister() docs:
* This should be called first in the device teardown code to make sure
- * userspace can't access the device instance any more.
+ * userspace can't access the device instance any more. Drivers that
support
+ * device unplug will probably want to call
drm_atomic_helper_shutdown() first
+ * in order to disable the hardware on regular driver module unload.
Which would give a driver remove callback like this:
static int driver_remove()
{
drm_atomic_helper_shutdown();
drm_dev_unregister();
}
Your reaction was that drm_atomic_helper_shutdown() needs to be called
after drm_dev_unregister() to avoid a race resulting in leaked objects.
However if we call it afterwards, ->unplugged will be true and the
driver can't touch hardware.
Then I proposed moving the unplugged state setting to:
void drm_dev_set_unplugged(struct drm_device *dev)
{
dev->unplugged = true;
synchronize_srcu(&drm_unplug_srcu);
}
Now it is possible to avoid the race and still touch hardware:
static int driver_remove()
{
drm_dev_unregister();
drm_atomic_helper_shutdown();
drm_dev_set_unplugged();
}
But now I'm back to the question: Is it the driver or the device that is
going away?
If it's the driver we are fine touching hardware, if it's the device it
depends on how we access the device resource and whether the subsystem
has protection in place to handle access after the device is gone. I
think USB can handle and block device access up until the disconnect
callback has finished (no point in doing so though, since the normal
operation is that the device is gone, not the driver unloading).
Is there a way to determine who's going away without changes to the
device core?
Maybe. The driver can only be unregistered if there are no open file
handles because a ref is taken on the driver module.
So maybe something along these lines:
int drm_dev_open_count(struct drm_device *dev)
{
int open_count;
mutex_lock(&drm_global_mutex);
open_count = dev->open_count;
mutex_unlock(&drm_global_mutex);
return open_count;
}
static int driver_remove()
{
drm_dev_unregister();
open_count = drm_dev_open_count();
/* Open fd's, device is going away, block access */
if (open_count)
drm_dev_set_unplugged();
drm_atomic_helper_shutdown();
/* No open fd's, driver is going away */
if (!open_count)
drm_dev_set_unplugged();
}
Personally I have 2 use cases:
- tinydrm SPI drivers
The only hotpluggable SPI controllers I know of are USB which should
handle device access while unregistering.
SPI devices can be removed, but the controller driver is still in
place so it's safe.
- A future USB driver (that I hope to work on when all this core stuff
is in place).
There's no point in touching the hw, so DRM can be set uplugged right
away in the disconnect() callback.
This means that I don't need to know who's going away for my use cases.
This turned out to be quite long winding, but the bottom line is that
having a separate function to set the unplugged state seems to give a
lot of flexibility for various use cases.
I hope I didn't muddy the waters even more :-)
Noralf.
> Thanks, Daniel
>
>>
>> Noralf.
>>
>>> Cheers, Daniel
>>>
>>>> Noralf.
>>>>
>>>>>> + * in order to disable the hardware on regular driver module unload.
>>>>>> */
>>>>>> void drm_dev_unregister(struct drm_device *dev)
>>>>>> {
>>>>>> @@ -845,6 +839,15 @@ void drm_dev_unregister(struct drm_device *dev)
>>>>>> if (drm_core_check_feature(dev, DRIVER_LEGACY))
>>>>>> drm_lastclose(dev);
>>>>>>
>>>>>> + /*
>>>>>> + * After synchronizing any critical read section is guaranteed to see
>>>>>> + * the new value of ->unplugged, and any critical section which might
>>>>>> + * still have seen the old value of ->unplugged is guaranteed to have
>>>>>> + * finished.
>>>>>> + */
>>>>>> + dev->unplugged = true;
>>>>>> + synchronize_srcu(&drm_unplug_srcu);
>>>>>> +
>>>>>> dev->registered = false;
>>>>>>
>>>>>> drm_client_dev_unregister(dev);
>>>>>> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>>>>>> index ca46a45a9cce..c50696c82a42 100644
>>>>>> --- a/include/drm/drm_drv.h
>>>>>> +++ b/include/drm/drm_drv.h
>>>>>> @@ -736,13 +736,11 @@ void drm_dev_unplug(struct drm_device *dev);
>>>>>> * drm_dev_is_unplugged - is a DRM device unplugged
>>>>>> * @dev: DRM device
>>>>>> *
>>>>>> - * This function can be called to check whether a hotpluggable is unplugged.
>>>>>> - * Unplugging itself is singalled through drm_dev_unplug(). If a device is
>>>>>> - * unplugged, these two functions guarantee that any store before calling
>>>>>> - * drm_dev_unplug() is visible to callers of this function after it completes
>>>>>> + * This function can be called to check whether @dev is unregistered. This can
>>>>>> + * be used to detect that the underlying parent device is gone.
>>>>>
>>>>> I think it'd be good to keep the first part, and just update the reference
>>>>> to drm_dev_unregister. So:
>>>>>
>>>>> * This function can be called to check whether a hotpluggable is unplugged.
>>>>> * Unplugging itself is singalled through drm_dev_unregister(). If a device is
>>>>> * unplugged, these two functions guarantee that any store before calling
>>>>> * drm_dev_unregister() is visible to callers of this function after it
>>>>> * completes.
>>>>>
>>>>> I think your version shrugs a few important details under the rug. With
>>>>> those nits addressed:
>>>>>
>>>>> Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
>>>>>
>>>>> Cheers, Daniel
>>>>>
>>>>>> *
>>>>>> - * WARNING: This function fundamentally races against drm_dev_unplug(). It is
>>>>>> - * recommended that drivers instead use the underlying drm_dev_enter() and
>>>>>> + * WARNING: This function fundamentally races against drm_dev_unregister(). It
>>>>>> + * is recommended that drivers instead use the underlying drm_dev_enter() and
>>>>>> * drm_dev_exit() function pairs.
>>>>>> */
>>>>>> static inline bool drm_dev_is_unplugged(struct drm_device *dev)
>>>>>> --
>>>>>> 2.20.1
>>>>>>
>>>>>> _______________________________________________
>>>>>> Intel-gfx mailing list
>>>>>> Intel-gfx at lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>>>
>>>
>
More information about the amd-gfx
mailing list