[PATCH 2/8] drm/xe: covert sysfs over to devm
Matthew Auld
matthew.auld at intel.com
Mon Apr 29 15:17:54 UTC 2024
On 29/04/2024 14:52, Lucas De Marchi wrote:
> On Mon, Apr 29, 2024 at 09:28:00AM GMT, Rodrigo Vivi wrote:
>> On Mon, Apr 29, 2024 at 01:14:38PM +0100, Matthew Auld wrote:
>>> Hotunplugging the device seems to result in stuff like:
>>>
>>> kobject_add_internal failed for tile0 with -EEXIST, don't try to
>>> register things with the same name in the same directory.
>>>
>>> We only remove the sysfs as part of drmm, however that is tied to the
>>> lifetime of the driver instance and not the device underneath. Attempt
>>> to fix by using devm for all of the remaining sysfs stuff related to the
>>> device.
>>
>> hmmm... so basically we should use the drmm only for the global module
>> stuff and the devm for things that are per device?
>
> that doesn't make much sense. drmm is supposed to run when the driver
> unbinds from the device... basically when all refcounts are gone with
> drm_dev_put(). Are we keeping a ref we shouldn't?
It's run when all refcounts are dropped for that particular drm_device,
but that is separate from the physical device underneath (struct
device). For example if something has an open driver fd the drmm release
action is not going to be called until after that is also closed. But in
the meantime we might have already removed the pci device and
re-attached it to a newly allocated drm_device/xe_driver instance, like
with hotunplug.
For example, currently we don't even call basic stuff like guc_fini()
etc. when removing the pci device, but rather when the drm_device is
released, which sounds quite broken.
So roughly drmm is for drm_device software level stuff and devm is for
stuff that needs to happen when removing the device. See also the doc
for drmm:
https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/gpu/drm/drm_managed.c#L23
Also: https://docs.kernel.org/gpu/drm-uapi.html#device-hot-unplug
>
> Lucas De Marchi
More information about the Intel-xe
mailing list