[PATCH 0/3] Use implicit kref infra

Thu Sep 3 18:52:16 UTC 2020

On 2020-09-02 9:57 p.m., Pan, Xinhui wrote:
> 
> 
>> 2020年9月2日 22:50，Tuikov, Luben <Luben.Tuikov at amd.com> 写道：
>>
>> On 2020-09-02 00:43, Pan, Xinhui wrote:
>>>
>>>
>>>> 2020年9月2日 11:46，Tuikov, Luben <Luben.Tuikov at amd.com> 写道：
>>>>
>>>> On 2020-09-01 21:42, Pan, Xinhui wrote:
>>>>> If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev.
>>>>
>>>> Do you mean "look at the function below", using "below" as an adverb?
>>>> "below" is not an adjective.
>>>>
>>>> I know dev is embedded in adev--I did that patchset.
>>>>
>>>>>
>>>>> 809 static void drm_dev_release(struct kref *ref)
>>>>> 810 {
>>>>> 811         struct drm_device *dev = container_of(ref, struct drm_device, ref);
>>>>> 812        
>>>>> 813         if (dev->driver->release)
>>>>> 814                 dev->driver->release(dev);
>>>>> 815 
>>>>> 816         drm_managed_release(dev);
>>>>> 817 
>>>>> 818         kfree(dev->managed.final_kfree);
>>>>> 819 }
>>>>
>>>> That's simple--this comes from change c6603c740e0e3
>>>> and it should be reverted. Simple as that.
>>>>
>>>> The version before this change was absolutely correct:
>>>>
>>>> static void drm_dev_release(struct kref *ref)
>>>> {
>>>> 	if (dev->driver->release)
>>>> 		dev->driver->release(dev);
>>>> 	else
>>>> 		drm_dev_fini(dev);
>>>> }
>>>>
>>>> Meaning, "the kref is now 0"--> if the driver
>>>> has a release, call it, else use our own.
>>>> But note that nothing can be assumed after this point,
>>>> about the existence of "dev".
>>>>
>>>> It is exactly because struct drm_device is statically
>>>> embedded into a container, struct amdgpu_device,
>>>> that this change above should be reverted.
>>>>
>>>> This is very similar to how fops has open/release
>>>> but no close. That is, the "release" is called
>>>> only when the last kref is released, i.e. when
>>>> kref goes from non-zero to zero.
>>>>
>>>> This uses the kref infrastructure which has been
>>>> around for about 20 years in the Linux kernel.
>>>>
>>>> I suggest reading the comments
>>>> in drm_dev.c mostly, "DOC: driver instance overview"
>>>> starting at line 240 onwards. This is right above
>>>> drm_put_dev(). There is actually an example of a driver
>>>> in the comment. Also the comment to drm_dev_init().
>>>>
>>>> Now, take a look at this:
>>>>
>>>> /**
>>>> * drm_dev_put - Drop reference of a DRM device
>>>> * @dev: device to drop reference of or NULL
>>>> *
>>>> * This decreases the ref-count of @dev by one. The device is destroyed if the
>>>> * ref-count drops to zero.
>>>> */
>>>> void drm_dev_put(struct drm_device *dev)
>>>> {
>>>>       if (dev)
>>>>               kref_put(&dev->ref, drm_dev_release);
>>>> }
>>>> EXPORT_SYMBOL(drm_dev_put);
>>>>
>>>> Two things:
>>>>
>>>> 1. It is us, who kzalloc the amdgpu device, which contains
>>>> the drm_device (you'll see this discussed in the reading
>>>> material I pointed to above). We do this because we're
>>>> probing the PCI device whether we'll work it it or not.
>>>>
>>>
>>> that is true.
>>
>> Of course it's true--good morning!
>>
>>> My understanding of the drm core code is like something below.
>>
>> Let me stop you right there--just read the documentation I pointed
>> to you at.
>>
>>> struct B { 
>>> 	strcut A 
>>> }
>>> we initialize A firstly and initialize B in the end. But destroy B firstly and destory A in the end.
>>
>> No!
>> B, which is the amdgpu_device struct "exists" before A, which is the DRM struct.
>> This is why DRM recommends to _embed_ it into the driver's own device struct,
>> as the documentation I pointed you to at.
>>
> I think you are misleading me here.  A pci dev as you said below can act as many roles, a drm dev, a tty dev, etc.
> say, struct B{
> struct A;
> struct TTY;
> struct printer;
> ...
> }
> but TTY or other members has nothing to do with our discussion.
> 
> B of course exists before A. but the code logic is not that. code below is really rare in drm world.
> create_B()
> {
> 	init B members
> 	return create_A()
> }
> So usually B have more work to do after it initialize A.
> then code should like below
> create_B()
> {
> 	init B base members
> 	create_A()
> 	init B extended members
> }
> 
> 
> For release part.
> release B extended member
> release A
> release B base member
> 
> a good design should not have the so-called extended and base members existing in the release process.
> Now have a look at the drm core code.
> it expects driver to do release process like below.
> release B
> cleanup work of A
> 
> as long as the cleanup work of A exists, we can not do a pure release of B.
> 
> So if you want to follow the ruls of kref, you have to rework the drm core code first. only after that, we can do a pure release of B.
> 
> What I am confused is that, kfer sits in drm dev. why adev must be destroyed too when drm dev is going to be destroyed.
> adev is not equal to drm dev.
> I think struct below is more fit for the logic.
> struct adev {
> 	struct drm * ddev_p = &adev.ddev
> 	struct type *odev_p  = &adev.odev
> 	struct drm ddev
> 	struct type odev
> }

I've not much idea what you said above.

Consider that DRM dev is embedded into amdgpu_device, as opposed to
be a pointer to an allocated DRM dev (which used to be the case before
my patch making the driver device struct a container and DRM a member).
And as such, DRM dev embedded into amdgpu_device, then either both
exist or neither exists, allocated in memory.

DRM layer can finish with a DRM device, fine.
amdgpu driver can finish with an amdgpu device, fine.
They can happen out of order.

But the beauty of kref, is that regardless of how a driver
or layer or a third entity uses and later relinquishes
a device it is working on, only when the kref goes to 0,
is the memory of, in our case both DRM device struct and
amdgpu_device struct both, freed.

And as such, this happens implicitly, without explicit logic
or tracking. All that is required is that if an entity wants
to use a struct, it calls kref-get, and when it is done using
it, it calls kref-put.

The new half-baked managed resources, ported from devres, obviates this
a bit, and smudges this clean separation, because now with it,
the release method isn't terminal (final).

> 
>> "undone" first, since the DRM layer may finish with a device, but
>> the device may still exists with the driver and as well as with PCI.
>> This is very VERY common, with kernels, devices, device abstractions,
>> device layers: DRM dev <-- amdgpu dev <-- PCI dev.
>>
>>> But yes, practice is more complex. 
>>> if B has nothing to be destroyed. we can destory A directly, otherwise destroy B firstly.
>>
>> I'm sorry, that doesn't make sense. There is no such thing as "destroy directly"
>> and "otherwise"--this is absolutely not how this works.
>>
>> A good architecture doesn't have if-then-else--it's just a pure single-branch path.
> 
> well, there is code below everywhere.
> if (fops->release)
> 	fops->release
> else
> 	default_release

Yep, exactly! And this is the 3rd or 4th time I'm pasting
this, lets go for one more:

static void drm_dev_release(struct kref *ref)
{
	if (dev->driver->release)
		dev->driver->release(dev);
	else
		drm_dev_fini(dev);
}

Is how drm_dev_release() used to look like, before
"managed resources" came in. When you look at it,
it is clear what is happening.

You can have managed resources in DRM, but it should
be a bit more elegant and implicit, and possibly
preserve this finality of the release method.

For instance, one could simply decree that drivers
would allocate and free their container structure,
using new functions (as "DRM managed resources"
does right now), but without changing the release
method, thus leaving the finality to the driver.

At this point the driver may call a helper function
in DRM or, DRM can add a function to be called
in drm_dev_release() _before_ calling the driver's
release method.

Right now what we have in amdgpu is non-symmetric
allocation scheme, since with your patch, release
is missing.

Regards,
Luben

> 
>>>
>>> in this case, we can do something below in our release()
>>> //some cleanup work of B
>>> drm_dev_fini(dev);//destroy A
>>> kfree(adev)
>>>
>>>> 2. Using the kref infrastructure, when the ref goes to 0,
>>>> drm_dev_release is called. And here's the KEY:
>>>> Because WE allocated the container, we should free it--after the release
>>>> method is called, DRM cannot assume anything about the drm
>>>> device or the container. The "release" method is final.
>>>>
>>>> We allocate, we free. And we free only when the ref goes to 0.
>>>>
>>>> DRM can, in due time, "free" itself of the DRM device and stop
>>>> having knowledge of it--that's fine, but as long as the ref
>>>> is not 0, the amdgpu device and thus the contained DRM device,
>>>> cannot be freed.
>>>>
>>>>>
>>>>> You have to make another change something like
>>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>>>>> index 13068fdf4331..2aabd2b4c63b 100644
>>>>> --- a/drivers/gpu/drm/drm_drv.c
>>>>> +++ b/drivers/gpu/drm/drm_drv.c
>>>>> @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref)
>>>>>
>>>>>       drm_managed_release(dev);
>>>>>
>>>>> -       kfree(dev->managed.final_kfree);
>>>>> +       if (dev->driver->final_release)
>>>>> +               dev->driver->final_release(dev);
>>>>> }
>>>>
>>>> No. What's this?
>>>> There is no such thing as "final" release, nor is there a "partial" release.
>>>> When the kref goes to 0, the device disappears. Simple.
>>>> If someone is using it, they should kref-get it, and when they're
>>>> done with it, they should kref-put it.
>>>
>>> I just take an example here. add another release in the end. then no one could touch us. IOW, final_release.
>>
>> No, that's horrible.
>>
>>>
>>>
>>> A destroy B by a callback, then A destroy itself. It assumes B just free its own resource.
>>> but that makes trouble if some resource of A is allocated by B.
>>> Because B must take care of these common resource shared between A and B.
>>
>> No, that's horrible.
>>
>>>
>>> yes, that logical is more complex. So I think we can revert drm_dev_release to its previous version.
>>
>> drm_dev_release() in its original form, was pure:
>>
>> static void drm_dev_release(struct kref *ref)
>> {
>> 	if (dev->driver->release)
>> 		dev->driver->release(dev);
>> 	else
>> 		drm_dev_fini(dev);
>> }
>>
>>>
>>>>
>>>> The whole point is that this is done implicitly, via the kref infrastructure.
>>>> drm_dev_init() which we call in our PCI probe function, sets the kref to 1--all
>>>> as per the documentation I pointed you to above.
>>>
>>> I am not taking about kref. what we are discussing is all about the release sequence.
>>
>> You need to understand how the kref infrastructure works in the kernel. I've said
>> it a million times: it's implicit. The "release sequence" as you like to call it,
>> is implicit in the kref infrastructure.
>>
>>>
>>>
>>>>
>>>> Another point is that we can do some other stuff in the release
>>>> function, notify someone, write some registers, free memory we use
>>>> for that PCI device, etc.
>>>>
>>>> If the "managed resources" infrastructure wants to stay, it should hook
>>>> itself into drm_dev_fini() and into drm_dev_init() or drm_dev_register().
>>>> It shouldn't have to be so out-of-place like in patch 2/3 of this series,
>>>> where the drmm_add_final_kfree() is smack-dab in the middle of our PCI
>>>> discovery function, surrounded on top and bottom by drm_dev_init()
>>>> and drm_dev_register(). The "managed resources" infra should be non-invasive
>>>> and drivers shouldn't have to change to use it--it should be invisible to them.
>>>> Then our kref would just work.
>>>>
>>> yep, that make sense. But you need more changes to fix this issue. this patchset is insufficient.
>>
>> Or LESS. Less changes. Less is better. Basically revert and redo all this "managed resources".
>>
>> Regards,
>> Luben
>>
>>>
>>>
>>>>>
>>>>> And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree.
>>>>> Of course we can do some cleanup work in the driver's release callback. BUT no kfree.
>>>>
>>>> No! No final_kfree. It's a hack.
>>>>
>>>> Read the documentation in drm_drv.c I noted above--it lays out how this happens. Reading is required.
>>>>
>>>> Regards,
>>>> Luben
>>>>
>>>>
>>>>>
>>>>> -----原始邮件-----
>>>>> 发件人: "Tuikov, Luben" <Luben.Tuikov at amd.com>
>>>>> 日期: 2020年9月2日 星期三 09:07
>>>>> 收件人: "amd-gfx at lists.freedesktop.org" <amd-gfx at lists.freedesktop.org>, "dri-devel at lists.freedesktop.org" <dri-devel at lists.freedesktop.org>
>>>>> 抄送: "Deucher, Alexander" <Alexander.Deucher at amd.com>, Daniel Vetter <daniel at ffwll.ch>, "Pan, Xinhui" <Xinhui.Pan at amd.com>, "Tuikov, Luben" <Luben.Tuikov at amd.com>
>>>>> 主题: [PATCH 0/3] Use implicit kref infra
>>>>>
>>>>>   Use the implicit kref infrastructure to free the container
>>>>>   struct amdgpu_device, container of struct drm_device.
>>>>>
>>>>>   First, in drm_dev_register(), do not indiscriminately warn
>>>>>   when a DRM driver hasn't opted for managed.final_kfree,
>>>>>   but instead check if the driver has provided its own
>>>>>   "release" function callback in the DRM driver structure.
>>>>>   If that is the case, no warning.
>>>>>
>>>>>   Remove drmm_add_final_kfree(). We take care of that, in the
>>>>>   kref "release" callback when all refs are down to 0, via
>>>>>   drm_dev_put(), i.e. the free is implicit.
>>>>>
>>>>>   Remove superfluous NULL check, since the DRM device to be
>>>>>   suspended always exists, so long as the underlying PCI and
>>>>>   DRM devices exist.
>>>>>
>>>>>   Luben Tuikov (3):
>>>>>     drm: No warn for drivers who provide release
>>>>>     drm/amdgpu: Remove drmm final free
>>>>>     drm/amdgpu: Remove superfluous NULL check
>>>>>
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 2 --
>>>>>    drivers/gpu/drm/drm_drv.c                  | 3 ++-
>>>>>    3 files changed, 2 insertions(+), 6 deletions(-)
>>>>>
>>>>>   -- 
>>>>>   2.28.0.394.ge197136389
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>