[PATCH] drm/amdgpu: Fix a redundant kfree

Luben Tuikov luben.tuikov at amd.com
Wed Sep 2 03:47:26 UTC 2020


On 2020-09-01 5:58 p.m., Pan, Xinhui wrote:
> [AMD Official Use Only - Internal Distribution Only]
> 
> 
> 
> The correct thing to do this is to
> _leave the amdgpu_driver_release()_ alone,
> remove "drmm_add_final_kfree()" and qualify
> the WARN_ON() in drm_dev_register() by
> the existence of drm_driver.release() (i.e. non-NULL).
> 
> Re:  this drm driver release callback is more like a release notify. It is called in the beginning of the total release sequence. As you have made drm device a member of adev. So you can not free adev in the driver release callback.
> 

No. The DRM driver release callback is not "more like release notify".
When the kref goes to 0, the release callback is called. The kref
infrastructure has been around in the kernel for about 20 years now.

I don't understand what a "total release sequence" is. Is there
a "partial release sequence"? The stack below shows an example
of a release. It's very simple, when the kref goes to 0,
the DRM driver's callback is called.

> Maybe change the sequence of release,  say, put drm driver release in the end of total release sequence.
> Or still use the final_kfree to free adev and our release callback just do some other cleanup work.

No. "final kfree" is a hack. What we want to use is the implicit
nature of the kref infrastructure. You don't really care to "free",
how and when it happens. You just forget about "free."
It will implicitly take place when the kref transitions from non-zero
to 0. It's implicit.

Another advantage of using DRM driver's "release" callback
on kref going to 0, is that we can do other interesting stuff
just before we free the container structure amdgpu_device.

And most of all, we don't really need to be concerned with
sequencing the free, how, when, etc., it naturally takes place
on a kref-put, when that transitions the kref to 0.

Regards,
Luben


> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *From:* Tuikov, Luben <Luben.Tuikov at amd.com>
> *Sent:* Wednesday, September 2, 2020 4:35:32 AM
> *To:* Alex Deucher <alexdeucher at gmail.com>; Pan, Xinhui <Xinhui.Pan at amd.com>; Daniel Vetter <daniel.vetter at ffwll.ch>
> *Cc:* amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: Fix a redundant kfree
>  
> On 2020-09-01 10:12 a.m., Alex Deucher wrote:
>> On Tue, Sep 1, 2020 at 3:46 AM Pan, Xinhui <Xinhui.Pan at amd.com> wrote:
>>>
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>> drm_dev_alloc() alloc *dev* and set managed.final_kfree to dev to free
>>> itself.
>>> Now from commit 5cdd68498918("drm/amdgpu: Embed drm_device into
>>> amdgpu_device (v3)") we alloc *adev* and ddev is just a member of it.
>>> So drm_dev_release try to free a wrong pointer then.
>>>
>>> Also driver's release trys to free adev, but drm_dev_release will
>>> access dev after call drvier's release.
>>>
>>> To fix it, remove driver's release and set managed.final_kfree to adev.
>> 
>> I've got to admit, the documentation around drm_dev_init is hard to
>> follow.  We aren't really using the drmm stuff, but you have to use
>> drmm_add_final_kfree to avoid a warning.  The logic seems to make
>> sense though.
>> Acked-by: Alex Deucher <alexancer.deucher at amd.com>
> 
> The logic in patch 3/3 uses the kref infrastructure
> as described in drm_drv.c's comment on what the DRM
> usage is, i.e. taking advantage of the kref infrastructure.
> 
> In amdgpu_pci_probe() we call drm_dev_init() which takes
> a ref of 1 on the kref in the DRM device structure,
> and from then on, only when the kref transitions
> from non-zero to 0, do we free the container of
> DRM device, and this is beautifully shown in the
> kernel stack below (please take a look at the kernel
> stack below).
> 
> Using a kref is very powerful as it is implicit:
> when the kref transitions from non-zero to 0,
> then call the release method.
> 
> Furthermore, we own the release method, and we
> like that, as it is pure, as well as,
> there may be more things we'd like to do in the future
> before we free the amdgpu device: maybe free memory we're
> using specifically for that PCI device, maybe write
> some registers, maybe notify someone or something, etc.
> 
> On another note, adding "drmm_add_final_kfree()" in the middle
> of amdgpu_pci_probe() seems hackish--it's neither part
> of drm_dev_init() nor of drm_dev_register(). We really
> don't need it, since we rely on the kref infrastructure
> to tell us when to free the device, and if you'd look
> at the beautiful stack below, it knows exactly when that is,
> i.e. when to free it.
> 
> The correct thing to do this is to
> _leave the amdgpu_driver_release()_ alone,
> remove "drmm_add_final_kfree()" and qualify
> the WARN_ON() in drm_dev_register() by
> the existence of drm_driver.release() (i.e. non-NULL).
> 
> I'll submit a sequence of patches to fix this right.
> 
> Regards,
> Luben
> 
>> 
>>>
>>> [   36.269348] BUG: unable to handle page fault for address: ffffa0c279940028
>>> [   36.276841] #PF: supervisor read access in kernel mode
>>> [   36.282434] #PF: error_code(0x0000) - not-present page
>>> [   36.288053] PGD 676601067 P4D 676601067 PUD 86a414067 PMD 86a247067 PTE 800ffff8066bf060
>>> [   36.296868] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
>>> [   36.302409] CPU: 4 PID: 1375 Comm: bash Tainted: G           O      5.9.0-rc2+ #46
>>> [   36.310670] Hardware name: System manufacturer System Product Name/PRIME Z390-A, BIOS 1401 11/26/2019
>>> [   36.320725] RIP: 0010:drm_managed_release+0x25/0x110 [drm]
>>> [   36.326741] Code: 80 00 00 00 00 0f 1f 44 00 00 55 48 c7 c2 5a 9f 41 c0 be 00 02 00 00 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 ec 08 <48> 8b 7f 18 e8 c2 10 ff ff 4d 8b 74 24 20 49 8d 44 24 5
>>> [   36.347217] RSP: 0018:ffffb9424141fce0 EFLAGS: 00010282
>>> [   36.352931] RAX: 0000000000000006 RBX: ffffa0c279940010 RCX: 0000000000000006
>>> [   36.360718] RDX: ffffffffc0419f5a RSI: 0000000000000200 RDI: ffffa0c279940010
>>> [   36.368503] RBP: ffffb9424141fd10 R08: 0000000000000001 R09: 0000000000000001
>>> [   36.376304] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0c279940010
>>> [   36.384070] R13: ffffffffc0e2a000 R14: ffffa0c26924e220 R15: fffffffffffffff2
>>> [   36.391845] FS:  00007fc4a277b740(0000) GS:ffffa0c288e00000(0000) knlGS:0000000000000000
>>> [   36.400669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   36.406937] CR2: ffffa0c279940028 CR3: 0000000792304006 CR4: 00000000003706e0
>>> [   36.414732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [   36.422550] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [   36.430354] Call Trace:
>>> [   36.433044]  drm_dev_put.part.0+0x40/0x60 [drm]
>>> [   36.438017]  drm_dev_put+0x13/0x20 [drm]
>>> [   36.442398]  amdgpu_pci_remove+0x56/0x60 [amdgpu]
>>> [   36.447528]  pci_device_remove+0x3e/0xb0
>>> [   36.451807]  device_release_driver_internal+0xff/0x1d0
>>> [   36.457416]  device_release_driver+0x12/0x20
>>> [   36.462094]  pci_stop_bus_device+0x70/0xa0
>>> [   36.466588]  pci_stop_and_remove_bus_device_locked+0x1b/0x30
>>> [   36.472786]  remove_store+0x7b/0x90
>>> [   36.476614]  dev_attr_store+0x17/0x30
>>> [   36.480646]  sysfs_kf_write+0x4b/0x60
>>> [   36.484655]  kernfs_fop_write+0xe8/0x1d0
>>> [   36.488952]  vfs_write+0xf5/0x230
>>> [   36.492562]  ksys_write+0x70/0xf0
>>> [   36.496206]  __x64_sys_write+0x1a/0x20
>>> [   36.500292]  do_syscall_64+0x38/0x90
>>> [   36.504219]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>
>>> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 +---------
>>>  1 file changed, 1 insertion(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> index c12e9acd421d..52fc0c6c8538 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> @@ -1165,7 +1165,7 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>>>  if (ret)
>>>  goto err_free;
>>>
>>> -drmm_add_final_kfree(ddev, ddev);
>>> +drmm_add_final_kfree(ddev, adev);
>>>
>>>  if (!supports_atomic)
>>>  ddev->driver_features &= ~DRIVER_ATOMIC;
>>> @@ -1221,13 +1221,6 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>>>  drm_dev_put(dev);
>>>  }
>>>
>>> -static void amdgpu_driver_release(struct drm_device *ddev)
>>> -{
>>> -struct amdgpu_device *adev = drm_to_adev(ddev);
>>> -
>>> -kfree(adev);
>>> -}
>>> -
>>>  static void
>>>  amdgpu_pci_shutdown(struct pci_dev *pdev)
>>>  {
>>> @@ -1522,7 +1515,6 @@ static struct drm_driver kms_driver = {
>>>  .open = amdgpu_driver_open_kms,
>>>  .postclose = amdgpu_driver_postclose_kms,
>>>  .lastclose = amdgpu_driver_lastclose_kms,
>>> -.release   = amdgpu_driver_release,
>>>  .irq_handler = amdgpu_irq_handler,
>>>  .ioctls = amdgpu_ioctls_kms,
>>>  .gem_free_object_unlocked = amdgpu_gem_object_free,
>>> --
>>> 2.25.1
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7CLuben.Tuikov%40amd.com%7C5233b0caaac449417a7208d84e810de0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637345663485193534&sdata=A%2FzJpoQmUhGBGts5mQfp%2BlVu1FL5NVbe5qfRelb2Og8%3D&reserved=0 <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7CLuben.Tuikov%40amd.com%7C5233b0caaac449417a7208d84e810de0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637345663485193534&sdata=A%2FzJpoQmUhGBGts5mQfp%2BlVu1FL5NVbe5qfRelb2Og8%3D&reserved=0>
> 



More information about the amd-gfx mailing list