[PATCH 1/1] drm/amdgpu: Take a reference to an exported BO
Christian König
ckoenig.leichtzumerken at gmail.com
Tue May 5 07:47:42 UTC 2020
Just to reply here once more, this patch is a clear NAK.
The references are grabbed in the call path of drm_gem_prime_export()
and dropped again in drm_gem_dmabuf_release().
So they are perfectly balanced as far as I can see.
Regards,
Christian.
Am 01.05.20 um 16:44 schrieb Felix Kuehling:
>
> [dropping my gmail address]
>
> We saw this backtrace showing the call chain while investigating a
> kernel oops caused by this issue on the DKMS branch with the KFD IPC
> API. It happens after a dma-buf file is released with fput:
>
> [ 1255.049330] BUG: kernel NULL pointer dereference, address: 000000000000051e
> [ 1255.049727] #PF: supervisor read access in kernel mode
> [ 1255.050092] #PF: error_code(0x0000) - not-present page
> [ 1255.050416] PGD 0 P4D 0
> [ 1255.050736] Oops: 0000 [#1] SMP PTI
> [ 1255.051060] CPU: 27 PID: 2292 Comm: kworker/27:2 Tainted: G OE 5.3.0-46-generic #38~18.04.1-Ubuntu
> [ 1255.051400] Hardware name: Supermicro SYS-4029GP-TRT2/X11DPG-OT-CPU, BIOS 3.0a 02/26/2019
> [ 1255.051752] Workqueue: events delayed_fput
> [ 1255.052111] RIP: 0010:drm_gem_object_put_unlocked+0x1c/0x70 [drm]
> [ 1255.052465] Code: 4d 80 c8 ee 0f 0b eb d8 66 0f 1f 44 00 00 0f 1f 44 00 00 48 85 ff 74 34 55 48 89 e5 41 54 53 48 89 fb 48 8b 7f 08 48 8b 47 20 <48> 83 b8 a0 00 00 00 00 74 1a 4c 8d 67 68 48 89 df 4c 89 e6 e8 9b
> [ 1255.053224] RSP: 0018:ffffb4b62035fdc8 EFLAGS: 00010286
> [ 1255.053613] RAX: 000000000000047e RBX: ffff9f2add197850 RCX: 0000000000000000
> [ 1255.054032] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9f2aa2548aa0
> [ 1255.054440] RBP: ffffb4b62035fdd8 R08: 0000000000000000 R09: 0000000000000000
> [ 1255.054860] R10: 0000000000000010 R11: ffff9f2a4b1cc310 R12: 0000000000080005
> [ 1255.055268] R13: ffff9f2a4b1cc310 R14: ffff9f4e369161e0 R15: ffff9f2a1b2f9080
> [ 1255.055674] FS: 0000000000000000(0000) GS:ffff9f4e3f740000(0000) knlGS:0000000000000000
> [ 1255.056087] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1255.056501] CR2: 000000000000051e CR3: 00000002df00a004 CR4: 00000000007606e0
> [ 1255.056923] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1255.057345] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1255.057763] PKRU: 55555554
> [ 1255.058179] Call Trace:
> [ 1255.058603] drm_gem_dmabuf_release+0x1a/0x30 [drm]
> [ 1255.059025] dma_buf_release+0x56/0x130
> [ 1255.059443] __fput+0xc6/0x260
> [ 1255.059856] delayed_fput+0x20/0x30
> [ 1255.060272] process_one_work+0x1fd/0x3f0
> [ 1255.060686] worker_thread+0x34/0x410
> [ 1255.061099] kthread+0x121/0x140
> [ 1255.061510] ? process_one_work+0x3f0/0x3f0
> [ 1255.061923] ? kthread_park+0xb0/0xb0
> [ 1255.062336] ret_from_fork+0x35/0x40
>
> drm_gem_object_put_unlocked calls drm_gem_object_free when the
> obj->refcount reaches 0. From there it calls
> dev->driver->gem_free_object_unlocked, which is amdgpu_gem_object_free
> in amdgpu.
>
> Regards,
> Felix
>
> Am 2020-05-01 um 10:29 a.m. schrieb Christian König:
>> Am 01.05.20 um 16:21 schrieb Felix Kuehling:
>>> From: Felix Kuehling <felix.kuehling at gmail.com>
>>>
>>> That reference gets dropped when the the dma-buf is freed. Not
>>> incrementing
>>> the refcount can lead to use-after-free errors.
>>>
>>> Signed-off-by: Felix Kuehling <felix.kuehling at gmail.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 9 ++++++++-
>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>>> index ffeb20f11c07..a0f9b3ef4aad 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>>> @@ -398,8 +398,15 @@ struct dma_buf *amdgpu_gem_prime_export(struct
>>> drm_gem_object *gobj,
>>> return ERR_PTR(-EPERM);
>>> buf = drm_gem_prime_export(gobj, flags);
>>> - if (!IS_ERR(buf))
>>> + if (!IS_ERR(buf)) {
>>> buf->ops = &amdgpu_dmabuf_ops;
>>> + /* GEM needs a reference to the underlying object
>>> + * that gets dropped when the dma-buf is released,
>>> + * through the amdgpu_gem_object_free callback
>>> + * from drm_gem_object_put_unlocked.
>>> + */
>>> + amdgpu_bo_ref(bo);
>>> + }
>>
>> Of hand that doesn't sounds correct to me. Why should the exported bo
>> be closed through amdgpu_gem_object_free()?
>>
>> Regards,
>> Christian.
>>
>>> return buf;
>>> }
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20200505/91aae22d/attachment.htm>
More information about the amd-gfx
mailing list