[PATCH 0/3] Use implicit kref infra

Luben Tuikov luben.tuikov at amd.com
Wed Sep 2 03:46:18 UTC 2020


On 2020-09-01 21:42, Pan, Xinhui wrote:
> If you take a look at the below function, you should not use driver's release to free adev. As dev is embedded in adev.

Do you mean "look at the function below", using "below" as an adverb?
"below" is not an adjective.

I know dev is embedded in adev--I did that patchset.

> 
>  809 static void drm_dev_release(struct kref *ref)
>  810 {
>  811         struct drm_device *dev = container_of(ref, struct drm_device, ref);
>  812        
>  813         if (dev->driver->release)
>  814                 dev->driver->release(dev);
>  815 
>  816         drm_managed_release(dev);
>  817 
>  818         kfree(dev->managed.final_kfree);
>  819 }

That's simple--this comes from change c6603c740e0e3
and it should be reverted. Simple as that.

The version before this change was absolutely correct:

static void drm_dev_release(struct kref *ref)
{
	if (dev->driver->release)
		dev->driver->release(dev);
	else
		drm_dev_fini(dev);
}

Meaning, "the kref is now 0"--> if the driver
has a release, call it, else use our own.
But note that nothing can be assumed after this point,
about the existence of "dev".

It is exactly because struct drm_device is statically
embedded into a container, struct amdgpu_device,
that this change above should be reverted.

This is very similar to how fops has open/release
but no close. That is, the "release" is called
only when the last kref is released, i.e. when
kref goes from non-zero to zero.

This uses the kref infrastructure which has been
around for about 20 years in the Linux kernel.

I suggest reading the comments
in drm_dev.c mostly, "DOC: driver instance overview"
starting at line 240 onwards. This is right above
drm_put_dev(). There is actually an example of a driver
in the comment. Also the comment to drm_dev_init().

Now, take a look at this:

/**
 * drm_dev_put - Drop reference of a DRM device
 * @dev: device to drop reference of or NULL
 *
 * This decreases the ref-count of @dev by one. The device is destroyed if the
 * ref-count drops to zero.
 */
void drm_dev_put(struct drm_device *dev)
{
        if (dev)
                kref_put(&dev->ref, drm_dev_release);
}
EXPORT_SYMBOL(drm_dev_put);

Two things:

1. It is us, who kzalloc the amdgpu device, which contains
the drm_device (you'll see this discussed in the reading
material I pointed to above). We do this because we're
probing the PCI device whether we'll work it it or not.

2. Using the kref infrastructure, when the ref goes to 0,
drm_dev_release is called. And here's the KEY:
Because WE allocated the container, we should free it--after the release
method is called, DRM cannot assume anything about the drm
device or the container. The "release" method is final.

We allocate, we free. And we free only when the ref goes to 0.

DRM can, in due time, "free" itself of the DRM device and stop
having knowledge of it--that's fine, but as long as the ref
is not 0, the amdgpu device and thus the contained DRM device,
cannot be freed.

> 
> You have to make another change something like
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 13068fdf4331..2aabd2b4c63b 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -815,7 +815,8 @@ static void drm_dev_release(struct kref *ref)
>  
>         drm_managed_release(dev);
>  
> -       kfree(dev->managed.final_kfree);
> +       if (dev->driver->final_release)
> +               dev->driver->final_release(dev);
>  }

No. What's this?
There is no such thing as "final" release, nor is there a "partial" release.
When the kref goes to 0, the device disappears. Simple.
If someone is using it, they should kref-get it, and when they're
done with it, they should kref-put it.

The whole point is that this is done implicitly, via the kref infrastructure.
drm_dev_init() which we call in our PCI probe function, sets the kref to 1--all
as per the documentation I pointed you to above.

Another point is that we can do some other stuff in the release
function, notify someone, write some registers, free memory we use
for that PCI device, etc.

If the "managed resources" infrastructure wants to stay, it should hook
itself into drm_dev_fini() and into drm_dev_init() or drm_dev_register().
It shouldn't have to be so out-of-place like in patch 2/3 of this series,
where the drmm_add_final_kfree() is smack-dab in the middle of our PCI
discovery function, surrounded on top and bottom by drm_dev_init()
and drm_dev_register(). The "managed resources" infra should be non-invasive
and drivers shouldn't have to change to use it--it should be invisible to them.
Then our kref would just work.

> 
> And in the final_release callback we free the dev. But that is a little complex now. so I prefer still using final_kfree.
> Of course we can do some cleanup work in the driver's release callback. BUT no kfree.

No! No final_kfree. It's a hack.

Read the documentation in drm_drv.c I noted above--it lays out how this happens. Reading is required.

Regards,
Luben


> 
> -----原始邮件-----
> 发件人: "Tuikov, Luben" <Luben.Tuikov at amd.com>
> 日期: 2020年9月2日 星期三 09:07
> 收件人: "amd-gfx at lists.freedesktop.org" <amd-gfx at lists.freedesktop.org>, "dri-devel at lists.freedesktop.org" <dri-devel at lists.freedesktop.org>
> 抄送: "Deucher, Alexander" <Alexander.Deucher at amd.com>, Daniel Vetter <daniel at ffwll.ch>, "Pan, Xinhui" <Xinhui.Pan at amd.com>, "Tuikov, Luben" <Luben.Tuikov at amd.com>
> 主题: [PATCH 0/3] Use implicit kref infra
> 
>     Use the implicit kref infrastructure to free the container
>     struct amdgpu_device, container of struct drm_device.
>     
>     First, in drm_dev_register(), do not indiscriminately warn
>     when a DRM driver hasn't opted for managed.final_kfree,
>     but instead check if the driver has provided its own
>     "release" function callback in the DRM driver structure.
>     If that is the case, no warning.
>     
>     Remove drmm_add_final_kfree(). We take care of that, in the
>     kref "release" callback when all refs are down to 0, via
>     drm_dev_put(), i.e. the free is implicit.
>     
>     Remove superfluous NULL check, since the DRM device to be
>     suspended always exists, so long as the underlying PCI and
>     DRM devices exist.
>     
>     Luben Tuikov (3):
>       drm: No warn for drivers who provide release
>       drm/amdgpu: Remove drmm final free
>       drm/amdgpu: Remove superfluous NULL check
>     
>      drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
>      drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 2 --
>      drivers/gpu/drm/drm_drv.c                  | 3 ++-
>      3 files changed, 2 insertions(+), 6 deletions(-)
>     
>     -- 
>     2.28.0.394.ge197136389
>     
>     
> 



More information about the amd-gfx mailing list