[Intel-gfx] [PATCH 1/9] drm/connector: Make the drm_sysfs connector->kdev device hold a reference to the connector

Hans de Goede hdegoede at redhat.com
Fri Apr 30 11:28:20 UTC 2021


Hi,

On 4/29/21 9:09 PM, Daniel Vetter wrote:
> On Thu, Apr 29, 2021 at 02:33:17PM +0200, Hans de Goede wrote:
>> Hi,
>>
>> On 4/29/21 2:04 PM, Daniel Vetter wrote:
>>> On Thu, Apr 29, 2021 at 01:54:46PM +0200, Greg Kroah-Hartman wrote:
>>>> On Thu, Apr 29, 2021 at 01:40:28PM +0200, Daniel Vetter wrote:
>>>>> On Wed, Apr 28, 2021 at 11:52:49PM +0200, Hans de Goede wrote:
>>>>>> Userspace could hold open a reference to the connector->kdev device,
>>>>>> through e.g. holding a sysfs-atrtribute open after
>>>>>> drm_sysfs_connector_remove() has been called. In this case the connector
>>>>>> could be free-ed while the connector->kdev device's drvdata is still
>>>>>> pointing to it.
>>>>>>
>>>>>> Give drm_connector devices there own device type, which allows
>>>>>> us to specify our own release function and make drm_sysfs_connector_add()
>>>>>> take a reference on the connector object, and have the new release
>>>>>> function put the reference when the device is released.
>>>>>>
>>>>>> Giving drm_connector devices there own device type, will also allow
>>>>>> checking if a device is a drm_connector device with a
>>>>>> "if (device->type == &drm_sysfs_device_connector)" check.
>>>>>>
>>>>>> Note that the setting of the name member of the device_type struct will
>>>>>> cause udev events for drm_connector-s to now contain DEVTYPE=drm_connector
>>>>>> as extra info. So this extends the uevent part of the userspace API.
>>>>>>
>>>>>> Signed-off-by: Hans de Goede <hdegoede at redhat.com>
>>>>>
>>>>> Are you sure? I thought sysfs is supposed to flush out any pending
>>>>> operations (they complete fast) and handle open fd internally?
>>>>
>>>> Yes, it "should" :)
>>>
>>> Thanks for confirming my vague memories :-)
>>>
>>> Hans, pls drop this one.
>>
>> Please see my earlier reply to your review of this patch, it is
>> still needed but for a different reason:
>>
>> """
>> We still need this change though to make sure that the 
>> "drm/connector: Add drm_connector_find_by_fwnode() function"
>> does not end up following a dangling drvdat pointer from one
>> if the drm_connector kdev-s.
>>
>> The class_dev_iter_init() in drm_connector_find_by_fwnode() gets
>> a reference on all devices and between getting that reference
>> and it calling drm_connector_get() - drm_connector_unregister()
>> may run and drop the possibly last reference to the
>> drm_connector object, freeing it and leaving the kdev's
>> drvdata as a dangling pointer.
>> """
>>
>> This is actually why I added it initially, and while adding it
>> I came up with this wrong theory of why it was necessary independently
>> of the drm_connector_find_by_fwnode() addition, sorry about that.
> 
> Generally that's handled by a kref_get_unless_zero under the protection of
> the lock which protects the weak reference. Which I think is the right
> model here (at a glance at least) since this is a lookup function.

I'm afraid that things are a bit more complicated here. The idea here
is that we have a subsystem outside of the DRM subsystem which received
a hotplug event for a drm-connector.  The only info which this subsystem
has is a reference on the fwnode level (either through device-tree or
to platform-code instantiating software-fwnode-s + links for this).

So in order to deliver the hotplug event to the connector we need
to lookup the connector by fwnode.

I've chosen to implement this by iterating over all drm_class
devices with a dev_type of drm_connector using class_dev_iter_init()
and friends. This makes sure that we either get a reference to
the device, or that we skip the device if it is being deleted.

But this just gives us a reference to the connector->kdev, not
to the connector itself. A pointer to the connector itself is stored
as drvdata inside the device, but without taking a reference as
this patch does, there is no guarantee that that pointer does not
point to possibly free-ed mem.

We could set drvdata to 0 from drm_sysfs_connector_remove()
Before calling device_unregister(connector->kdev) and then do
something like this inside drm_connector_find_by_fwnode():

/*
 * Lock the device to ensure we either see the drvdata == NULL
 * set by drm_sysfs_connector_remove(); or we block the removal
 * from continuing until we are done with the device.
 */
device_lock(dev);
connector = dev_get_drvdata(dev);
if (connector && connector->fwnode == fwnode) {
	drm_connector_get(connector);
	found = connector;
}
device_unlock(dev);

With the device_lock() synchronizing against the device_lock()
in device_unregister(connector->kdev). So that we either see
drvdata == NULL if we race with unregistering; or we get
a reference on the drm_connector obj before its ref-count can
drop to 0.

There might be places though where we call code take the device_lock
while holding a lock necessary for the drm_connector_get() , so
this approach might lead to an AB BA deadlock. As such I think
my original approach is better (also see below).

> Lookup tables holding full references tends to lead to all kinds of bad
> side effects.

The proposed reference is not part of a lookup list, it is a
reference from the kdev on the drm_connector object which gets
dropped as soon as the kdev's refcount hits 0, which normally
happens directly after drm_connector_unregister() has run.

In many other places in the kernel problems like this are
solved by embedding the device struct inside the containing
data struct (so the drm_connector struct) and using the
device_struct's refcounting for all refcounting and using
the device struct's release callback as the release callback for
the entire object.

That is not doable here since the drm_object code has its own
refcounting going on. What this patch is in essence doing is
simulating having only 1 refcount, by making sure the
main-object release callback does not get run until
the drm_objects' refcount and the device's refcount have
both reached 0 (by keeping the drm_object's refcount at
a minimum of 1 as long as there are references to the
device).

Regards,

Hans



More information about the Intel-gfx mailing list