[Intel-gfx] [PATCH 1/9] drm/connector: Make the drm_sysfs connector->kdev device hold a reference to the connector
Hans de Goede
hdegoede at redhat.com
Fri Apr 30 13:32:39 UTC 2021
Hi,
On 4/30/21 1:38 PM, Daniel Vetter wrote:
> On Fri, Apr 30, 2021 at 1:28 PM Hans de Goede <hdegoede at redhat.com> wrote:
>>
>> Hi,
>>
>> On 4/29/21 9:09 PM, Daniel Vetter wrote:
>>> On Thu, Apr 29, 2021 at 02:33:17PM +0200, Hans de Goede wrote:
>>>> Hi,
>>>>
>>>> On 4/29/21 2:04 PM, Daniel Vetter wrote:
>>>>> On Thu, Apr 29, 2021 at 01:54:46PM +0200, Greg Kroah-Hartman wrote:
>>>>>> On Thu, Apr 29, 2021 at 01:40:28PM +0200, Daniel Vetter wrote:
>>>>>>> On Wed, Apr 28, 2021 at 11:52:49PM +0200, Hans de Goede wrote:
>>>>>>>> Userspace could hold open a reference to the connector->kdev device,
>>>>>>>> through e.g. holding a sysfs-atrtribute open after
>>>>>>>> drm_sysfs_connector_remove() has been called. In this case the connector
>>>>>>>> could be free-ed while the connector->kdev device's drvdata is still
>>>>>>>> pointing to it.
>>>>>>>>
>>>>>>>> Give drm_connector devices there own device type, which allows
>>>>>>>> us to specify our own release function and make drm_sysfs_connector_add()
>>>>>>>> take a reference on the connector object, and have the new release
>>>>>>>> function put the reference when the device is released.
>>>>>>>>
>>>>>>>> Giving drm_connector devices there own device type, will also allow
>>>>>>>> checking if a device is a drm_connector device with a
>>>>>>>> "if (device->type == &drm_sysfs_device_connector)" check.
>>>>>>>>
>>>>>>>> Note that the setting of the name member of the device_type struct will
>>>>>>>> cause udev events for drm_connector-s to now contain DEVTYPE=drm_connector
>>>>>>>> as extra info. So this extends the uevent part of the userspace API.
>>>>>>>>
>>>>>>>> Signed-off-by: Hans de Goede <hdegoede at redhat.com>
>>>>>>>
>>>>>>> Are you sure? I thought sysfs is supposed to flush out any pending
>>>>>>> operations (they complete fast) and handle open fd internally?
>>>>>>
>>>>>> Yes, it "should" :)
>>>>>
>>>>> Thanks for confirming my vague memories :-)
>>>>>
>>>>> Hans, pls drop this one.
>>>>
>>>> Please see my earlier reply to your review of this patch, it is
>>>> still needed but for a different reason:
>>>>
>>>> """
>>>> We still need this change though to make sure that the
>>>> "drm/connector: Add drm_connector_find_by_fwnode() function"
>>>> does not end up following a dangling drvdat pointer from one
>>>> if the drm_connector kdev-s.
>>>>
>>>> The class_dev_iter_init() in drm_connector_find_by_fwnode() gets
>>>> a reference on all devices and between getting that reference
>>>> and it calling drm_connector_get() - drm_connector_unregister()
>>>> may run and drop the possibly last reference to the
>>>> drm_connector object, freeing it and leaving the kdev's
>>>> drvdata as a dangling pointer.
>>>> """
>>>>
>>>> This is actually why I added it initially, and while adding it
>>>> I came up with this wrong theory of why it was necessary independently
>>>> of the drm_connector_find_by_fwnode() addition, sorry about that.
>>>
>>> Generally that's handled by a kref_get_unless_zero under the protection of
>>> the lock which protects the weak reference. Which I think is the right
>>> model here (at a glance at least) since this is a lookup function.
>>
>> I'm afraid that things are a bit more complicated here. The idea here
>> is that we have a subsystem outside of the DRM subsystem which received
>> a hotplug event for a drm-connector. The only info which this subsystem
>> has is a reference on the fwnode level (either through device-tree or
>> to platform-code instantiating software-fwnode-s + links for this).
>>
>> So in order to deliver the hotplug event to the connector we need
>> to lookup the connector by fwnode.
>>
>> I've chosen to implement this by iterating over all drm_class
>> devices with a dev_type of drm_connector using class_dev_iter_init()
>> and friends. This makes sure that we either get a reference to
>> the device, or that we skip the device if it is being deleted.
>>
>> But this just gives us a reference to the connector->kdev, not
>> to the connector itself. A pointer to the connector itself is stored
>> as drvdata inside the device, but without taking a reference as
>> this patch does, there is no guarantee that that pointer does not
>> point to possibly free-ed mem.
>>
>> We could set drvdata to 0 from drm_sysfs_connector_remove()
>> Before calling device_unregister(connector->kdev) and then do
>> something like this inside drm_connector_find_by_fwnode():
>>
>> /*
>> * Lock the device to ensure we either see the drvdata == NULL
>> * set by drm_sysfs_connector_remove(); or we block the removal
>> * from continuing until we are done with the device.
>> */
>> device_lock(dev);
>> connector = dev_get_drvdata(dev);
>> if (connector && connector->fwnode == fwnode) {
>> drm_connector_get(connector);
>> found = connector;
>> }
>> device_unlock(dev);
>
> Yes this is what I mean. Except not a drm_connector_get, but a
> kref_get_unless_zero. The connector might already be on it's way out,
> but the drvdata not yet cleared.
The function we race with is drm_sysfs_connector_remove() and either:
1. The lookup wins the race in which case drm_sysfs_connector_remove()
can only complete after the drm_connector_get(); and the connector
kref won't drop to 0 before drm_sysfs_connector_remove() completes; or
2. drm_sysfs_connector_remove() wins the race in which case drvdata will
be 0.
So using kref_get_unless_zero here will not make a difference and
requires poking inside the drm_connector internals.
Note I will probably go with your suggestion below, so whether or
not to use kref_get_unless_zero here is likely no longer relevant.
>> With the device_lock() synchronizing against the device_lock()
>> in device_unregister(connector->kdev). So that we either see
>> drvdata == NULL if we race with unregistering; or we get
>> a reference on the drm_connector obj before its ref-count can
>> drop to 0.
>
> The trouble is that most connectors aren't full drivers on their kdev.
> So this isn't the right lock. We need another lock which protects the
> drvdata pointer appropriately for drm connectors.
>
>> There might be places though where we call code take the device_lock
>> while holding a lock necessary for the drm_connector_get() , so
>> this approach might lead to an AB BA deadlock. As such I think
>> my original approach is better (also see below).
>>
>>> Lookup tables holding full references tends to lead to all kinds of bad
>>> side effects.
>>
>> The proposed reference is not part of a lookup list, it is a
>> reference from the kdev on the drm_connector object which gets
>> dropped as soon as the kdev's refcount hits 0, which normally
>> happens directly after drm_connector_unregister() has run.
>
> Yeah but the way you use it is for lookup purposes. What we're
> implementing is the "get me the drm_connector for this fwnode"
> functionality, and that _is_ a lookup.
Ack.
> How its implemented is an
> internal detail really, and somehow using full references for lookup
> functionality isn't great.
Ok, note that the caller of this only needs the reference for a
short while, what the caller does is:
connector = drm_connector_find_by_fwnode(dp->connector_fwnode);
if (connector) {
drm_connector_oob_hotplug_event(connector, &data);
drm_connector_put(connector);
}
As a result of out discussion I have been thinking about enforcing this
short-lifetime of the reference by changing:
void drm_connector_oob_hotplug_event(struct drm_connector *connector,
struct drm_connector_oob_hotplug_event_data *data);
to:
void drm_connector_oob_hotplug_event(struct fwnode_handle connector_fwnode,
struct drm_connector_oob_hotplug_event_data *data);
And making that do the lookup (+ almost immediate put) internally, making
the connector-lookup a purely drm-subsys internal thing and enforcing code
outside of the drm-subsys not holding a long-time reference to the connector
this way.
Please let me know if you prefer the variant where the connector lookup
details are hidden from the callers ?
Then I can change this for for v2 of this patch/series.
> I'm also not sure why we have to use the kdev stuff here. For other
> random objects we need to look up we're building that functionality on
> that object. It means you need to keep another list_head around for
> that lookup, but that's really not a big cost. E.g. drm_bridge/panel
> work like that.
Using class_for_each_dev seemed like a good way to iterate over all
the connectors. But given the discussion this has caused, just adding
a new static list + mutex for this to drivers/gpu/drm/drm_connector.c
sounds like it might be a better approach indeed.
So shall I change thing over to this approach for v2 of this patch/series?
Regards,
Hans
More information about the Intel-gfx
mailing list