[PATCH 2/8] drm/xe: covert sysfs over to devm

Aravind Iddamsetty aravind.iddamsetty at linux.intel.com
Tue Apr 30 09:42:49 UTC 2024


On 30/04/24 14:13, Jani Nikula wrote:
> On Mon, 29 Apr 2024, Lucas De Marchi <lucas.demarchi at intel.com> wrote:
>> On Mon, Apr 29, 2024 at 02:45:26PM GMT, Rodrigo Vivi wrote:
>>> On Mon, Apr 29, 2024 at 04:17:54PM +0100, Matthew Auld wrote:
>>>> On 29/04/2024 14:52, Lucas De Marchi wrote:
>>>>> On Mon, Apr 29, 2024 at 09:28:00AM GMT, Rodrigo Vivi wrote:
>>>>>> On Mon, Apr 29, 2024 at 01:14:38PM +0100, Matthew Auld wrote:
>>>>>>> Hotunplugging the device seems to result in stuff like:
>>>>>>>
>>>>>>> kobject_add_internal failed for tile0 with -EEXIST, don't try to
>>>>>>> register things with the same name in the same directory.
>>>>>>>
>>>>>>> We only remove the sysfs as part of drmm, however that is tied to the
>>>>>>> lifetime of the driver instance and not the device underneath. Attempt
>>>>>>> to fix by using devm for all of the remaining sysfs stuff related to the
>>>>>>> device.
>>>>>> hmmm... so basically we should use the drmm only for the global module
>>>>>> stuff and the devm for things that are per device?
>>>>> that doesn't make much sense. drmm is supposed to run when the driver
>>>>> unbinds from the device... basically when all refcounts are gone with
>>>>> drm_dev_put().  Are we keeping a ref we shouldn't?
>>>> It's run when all refcounts are dropped for that particular drm_device, but
>>>> that is separate from the physical device underneath (struct device). For
>>>> example if something has an open driver fd the drmm release action is not
>>>> going to be called until after that is also closed. But in the meantime we
>>>> might have already removed the pci device and re-attached it to a newly
>>>> allocated drm_device/xe_driver instance, like with hotunplug.
>>>>
>>>> For example, currently we don't even call basic stuff like guc_fini() etc.
>>>> when removing the pci device, but rather when the drm_device is released,
>>>> which sounds quite broken.
>>>>
>>>> So roughly drmm is for drm_device software level stuff and devm is for stuff
>>>> that needs to happen when removing the device. See also the doc for drmm:
>>>> https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/gpu/drm/drm_managed.c#L23
>>>>
>>>> Also: https://docs.kernel.org/gpu/drm-uapi.html#device-hot-unplug
>> yeah... I think you convinced me
> You've all also convinced me this is a PITA to get right for every
> contribution. If there's one thing I've learned, people will just cargo
> cult this stuff, and pick one or the other depending on what they happen
> to see. Needs vigilant review.
>
> BR,
> Jani.
>
>
>>> Cc: Aravind and Michal since this likely relates to the FLR discussion...
>>>
>>> but it looks to me that we should move more towards the devm_ and limit
>>> the usage of drmm_ to some very specific cases...

Hi Matt,

so if we do not destroy the previous instance from drm_device and re create a new one I
believe the drm_device naming keeps changing I believe it is allowed from driver pov but
from system or UMDs pov can they expect the card to be renamed.

eg: /dev/dri/card0 ->> /dev/dri/card1

Thanks,
Aravind.
>> agreed,
>>
>> Lucas De Marchi
>>
>>>>> Lucas De Marchi


More information about the Intel-xe mailing list