[PATCH 0/3] Use implicit kref infra
Luben Tuikov
luben.tuikov at amd.com
Wed Sep 2 14:59:58 UTC 2020
On 2020-09-02 02:52, Daniel Vetter wrote:
> On Tue, Sep 01, 2020 at 11:46:18PM -0400, Luben Tuikov wrote:
>> On 2020-09-01 21:42, Pan, Xinhui wrote:
>>> If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev.
>>
>> Do you mean "look at the function below", using "below" as an adverb?
>> "below" is not an adjective.
>>
>> I know dev is embedded in adev--I did that patchset.
>>
>>>
>>> 809 static void drm_dev_release(struct kref *ref)
>>> 810 {
>>> 811 struct drm_device *dev = container_of(ref, struct drm_device, ref);
>>> 812
>>> 813 if (dev->driver->release)
>>> 814 dev->driver->release(dev);
>>> 815
>>> 816 drm_managed_release(dev);
>>> 817
>>> 818 kfree(dev->managed.final_kfree);
>>> 819 }
>>
>> That's simple--this comes from change c6603c740e0e3
>> and it should be reverted. Simple as that.
>>
>> The version before this change was absolutely correct:
>>
>> static void drm_dev_release(struct kref *ref)
>> {
>> if (dev->driver->release)
>> dev->driver->release(dev);
>> else
>> drm_dev_fini(dev);
>> }
>>
>> Meaning, "the kref is now 0"--> if the driver
>> has a release, call it, else use our own.
>> But note that nothing can be assumed after this point,
>> about the existence of "dev".
>>
>> It is exactly because struct drm_device is statically
>> embedded into a container, struct amdgpu_device,
>> that this change above should be reverted.
>>
>> This is very similar to how fops has open/release
>> but no close. That is, the "release" is called
>> only when the last kref is released, i.e. when
>> kref goes from non-zero to zero.
>>
>> This uses the kref infrastructure which has been
>> around for about 20 years in the Linux kernel.
>>
>> I suggest reading the comments
>> in drm_dev.c mostly, "DOC: driver instance overview"
>> starting at line 240 onwards. This is right above
>> drm_put_dev(). There is actually an example of a driver
>> in the comment. Also the comment to drm_dev_init().
>>
>> Now, take a look at this:
>>
>> /**
>> * drm_dev_put - Drop reference of a DRM device
>> * @dev: device to drop reference of or NULL
>> *
>> * This decreases the ref-count of @dev by one. The device is destroyed if the
>> * ref-count drops to zero.
>> */
>> void drm_dev_put(struct drm_device *dev)
>> {
>> if (dev)
>> kref_put(&dev->ref, drm_dev_release);
>> }
>> EXPORT_SYMBOL(drm_dev_put);
>>
>> Two things:
>>
>> 1. It is us, who kzalloc the amdgpu device, which contains
>> the drm_device (you'll see this discussed in the reading
>> material I pointed to above). We do this because we're
>> probing the PCI device whether we'll work it it or not.
>>
>> 2. Using the kref infrastructure, when the ref goes to 0,
>> drm_dev_release is called. And here's the KEY:
>> Because WE allocated the container, we should free it--after the release
>> method is called, DRM cannot assume anything about the drm
>> device or the container. The "release" method is final.
>>
>> We allocate, we free. And we free only when the ref goes to 0.
>>
>> DRM can, in due time, "free" itself of the DRM device and stop
>> having knowledge of it--that's fine, but as long as the ref
>> is not 0, the amdgpu device and thus the contained DRM device,
>> cannot be freed.
>>
>>>
>>> You have to make another change something like
>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>>> index 13068fdf4331..2aabd2b4c63b 100644
>>> --- a/drivers/gpu/drm/drm_drv.c
>>> +++ b/drivers/gpu/drm/drm_drv.c
>>> @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref)
>>>
>>> drm_managed_release(dev);
>>>
>>> - kfree(dev->managed.final_kfree);
>>> + if (dev->driver->final_release)
>>> + dev->driver->final_release(dev);
>>> }
>>
>> No. What's this?
>> There is no such thing as "final" release, nor is there a "partial" release.
>> When the kref goes to 0, the device disappears. Simple.
>> If someone is using it, they should kref-get it, and when they're
>> done with it, they should kref-put it.
>>
>> The whole point is that this is done implicitly, via the kref infrastructure.
>> drm_dev_init() which we call in our PCI probe function, sets the kref to 1--all
>> as per the documentation I pointed you to above.
>>
>> Another point is that we can do some other stuff in the release
>> function, notify someone, write some registers, free memory we use
>> for that PCI device, etc.
>>
>> If the "managed resources" infrastructure wants to stay, it should hook
>> itself into drm_dev_fini() and into drm_dev_init() or drm_dev_register().
>> It shouldn't have to be so out-of-place like in patch 2/3 of this series,
>> where the drmm_add_final_kfree() is smack-dab in the middle of our PCI
>> discovery function, surrounded on top and bottom by drm_dev_init()
>> and drm_dev_register(). The "managed resources" infra should be non-invasive
>> and drivers shouldn't have to change to use it--it should be invisible to them.
>> Then our kref would just work.
>
> Unfortunately some part of that drm managed series is stuck still waiting
> for review (I guess I need to resubmit), but with that the docs would tell
> you to use devm_drm_dev_alloc instead of hand-rolling all this.
That's very very misleading--the documentation DOES NOT tell you this.
It tells you exactly how to do it using the "release" method
folded into the kref infrastructure. It is your changes
to the documentation which tell you to use this new alloc.
>
> Definitely not any of the ideas proposed in this discussion or patches :-)
These are nothing new--it's just standard DRM stuff described in the documentation
in drm_drv.c.
There are NO NEW IDEAS or anything "proposed" in this discussion or patches--it's
just standard stuff from DRM, which had been the case for the last 10 years,
until your "managed resources" came along. It's the managed resources
which are new.
Regards,
Luben
>
> I'll cc you on that series when I send it out again.
>
> Cheers, Daniel
>>
>>>
>>> And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree.
>>> Of course we can do some cleanup work in the driver's release callback. BUT no kfree.
>>
>> No! No final_kfree. It's a hack.
>>
>> Read the documentation in drm_drv.c I noted above--it lays out how this happens. Reading is required.
>>
>> Regards,
>> Luben
>>
>>
>>>
>>> -----原始邮件-----
>>> 发件人: "Tuikov, Luben" <Luben.Tuikov at amd.com>
>>> 日期: 2020年9月2日 星期三 09:07
>>> 收件人: "amd-gfx at lists.freedesktop.org" <amd-gfx at lists.freedesktop.org>, "dri-devel at lists.freedesktop.org" <dri-devel at lists.freedesktop.org>
>>> 抄送: "Deucher, Alexander" <Alexander.Deucher at amd.com>, Daniel Vetter <daniel at ffwll.ch>, "Pan, Xinhui" <Xinhui.Pan at amd.com>, "Tuikov, Luben" <Luben.Tuikov at amd.com>
>>> 主题: [PATCH 0/3] Use implicit kref infra
>>>
>>> Use the implicit kref infrastructure to free the container
>>> struct amdgpu_device, container of struct drm_device.
>>>
>>> First, in drm_dev_register(), do not indiscriminately warn
>>> when a DRM driver hasn't opted for managed.final_kfree,
>>> but instead check if the driver has provided its own
>>> "release" function callback in the DRM driver structure.
>>> If that is the case, no warning.
>>>
>>> Remove drmm_add_final_kfree(). We take care of that, in the
>>> kref "release" callback when all refs are down to 0, via
>>> drm_dev_put(), i.e. the free is implicit.
>>>
>>> Remove superfluous NULL check, since the DRM device to be
>>> suspended always exists, so long as the underlying PCI and
>>> DRM devices exist.
>>>
>>> Luben Tuikov (3):
>>> drm: No warn for drivers who provide release
>>> drm/amdgpu: Remove drmm final free
>>> drm/amdgpu: Remove superfluous NULL check
>>>
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 --
>>> drivers/gpu/drm/drm_drv.c | 3 ++-
>>> 3 files changed, 2 insertions(+), 6 deletions(-)
>>>
>>> --
>>> 2.28.0.394.ge197136389
>>>
>>>
>>>
>>
>
More information about the amd-gfx
mailing list