[PATCH] drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini+0x70c
Christian König
christian.koenig at amd.com
Mon Jun 23 15:25:21 UTC 2025
On 23.06.25 17:23, vitaly prosyak wrote:
>
> On 2025-06-23 11:16, Christian König wrote:
>> On 20.06.25 15:49, Alex Deucher wrote:
>>> On Wed, Jun 18, 2025 at 11:58 PM <vitaly.prosyak at amd.com> wrote:
>>>> From: Vitaly Prosyak <vitaly.prosyak at amd.com>
>>>>
>>>> The issue was reproduced on NV10 using IGT pci_unplug test.
>>>> It is expected that `amdgpu_driver_postclose_kms()` is called prior to `amdgpu_drm_release()`.
>>>> However, the bug is that `amdgpu_fpriv` was freed in `amdgpu_driver_postclose_kms()`, and then
>>>> later accessed in `amdgpu_drm_release()` via a call to `amdgpu_userq_mgr_fini()`.
>>>> As a result, KASAN detected a use-after-free condition, as shown in the log below.
>>>> The proposed fix is to move the calls to `amdgpu_eviction_fence_destroy()` and
>>>> `amdgpu_userq_mgr_fini()` into `amdgpu_driver_postclose_kms()`, so they are invoked before
>>>> `amdgpu_fpriv` is freed.
>>>>
>>>> This also ensures symmetry with the initialization path in `amdgpu_driver_open_kms()`,
>>>> where the following components are initialized:
>>>> - `amdgpu_userq_mgr_init()`
>>>> - `amdgpu_eviction_fence_init()`
>>>> - `amdgpu_ctx_mgr_init()`
>>>>
>>>> Correspondingly, in `amdgpu_driver_postclose_kms()` we should clean up using:
>>>> - `amdgpu_userq_mgr_fini()`
>>>> - `amdgpu_eviction_fence_destroy()`
>>>> - `amdgpu_ctx_mgr_fini()`
>>>>
>>>> This change eliminates the use-after-free and improves consistency in resource management between open and close paths.
>>>>
>>>> [ +0.094367] ==================================================================
>>>> [ +0.000026] BUG: KASAN: slab-use-after-free in amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu]
>>>> [ +0.000866] Write of size 8 at addr ffff88811c068c60 by task amd_pci_unplug/1737
>>>> [ +0.000026] CPU: 3 UID: 0 PID: 1737 Comm: amd_pci_unplug Not tainted 6.14.0+ #2
>>>> [ +0.000008] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020
>>>> [ +0.000004] Call Trace:
>>>> [ +0.000004] <TASK>
>>>> [ +0.000003] dump_stack_lvl+0x76/0xa0
>>>> [ +0.000010] print_report+0xce/0x600
>>>> [ +0.000009] ? amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu]
>>>> [ +0.000790] ? srso_return_thunk+0x5/0x5f
>>>> [ +0.000007] ? kasan_complete_mode_report_info+0x76/0x200
>>>> [ +0.000008] ? amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu]
>>>> [ +0.000684] kasan_report+0xbe/0x110
>>>> [ +0.000007] ? amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu]
>>>> [ +0.000601] __asan_report_store8_noabort+0x17/0x30
>>>> [ +0.000007] amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu]
>>>> [ +0.000801] ? __pfx_amdgpu_userq_mgr_fini+0x10/0x10 [amdgpu]
>>>> [ +0.000819] ? srso_return_thunk+0x5/0x5f
>>>> [ +0.000008] amdgpu_drm_release+0xa3/0xe0 [amdgpu]
>>>> [ +0.000604] __fput+0x354/0xa90
>>>> [ +0.000010] __fput_sync+0x59/0x80
>>>> [ +0.000005] __x64_sys_close+0x7d/0xe0
>>>> [ +0.000006] x64_sys_call+0x2505/0x26f0
>>>> [ +0.000006] do_syscall_64+0x7c/0x170
>>>> [ +0.000004] ? kasan_record_aux_stack+0xae/0xd0
>>>> [ +0.000005] ? srso_return_thunk+0x5/0x5f
>>>> [ +0.000004] ? kmem_cache_free+0x398/0x580
>>>> [ +0.000006] ? __fput+0x543/0xa90
>>>> [ +0.000006] ? srso_return_thunk+0x5/0x5f
>>>> [ +0.000004] ? __fput+0x543/0xa90
>>>> [ +0.000004] ? __kasan_check_read+0x11/0x20
>>>> [ +0.000007] ? srso_return_thunk+0x5/0x5f
>>>> [ +0.000004] ? __kasan_check_read+0x11/0x20
>>>> [ +0.000003] ? srso_return_thunk+0x5/0x5f
>>>> [ +0.000004] ? fpregs_assert_state_consistent+0x21/0xb0
>>>> [ +0.000006] ? srso_return_thunk+0x5/0x5f
>>>> [ +0.000004] ? syscall_exit_to_user_mode+0x4e/0x240
>>>> [ +0.000005] ? srso_return_thunk+0x5/0x5f
>>>> [ +0.000004] ? do_syscall_64+0x88/0x170
>>>> [ +0.000003] ? srso_return_thunk+0x5/0x5f
>>>> [ +0.000004] ? do_syscall_64+0x88/0x170
>>>> [ +0.000004] ? srso_return_thunk+0x5/0x5f
>>>> [ +0.000004] ? irqentry_exit+0x43/0x50
>>>> [ +0.000004] ? srso_return_thunk+0x5/0x5f
>>>> [ +0.000004] ? exc_page_fault+0x7c/0x110
>>>> [ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>> [ +0.000005] RIP: 0033:0x7ffff7b14f67
>>>> [ +0.000005] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 73 ba f7 ff
>>>> [ +0.000004] RSP: 002b:00007fffffffe358 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
>>>> [ +0.000006] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffff7b14f67
>>>> [ +0.000003] RDX: 0000000000000000 RSI: 00007ffff7f5755a RDI: 0000000000000003
>>>> [ +0.000003] RBP: 00007fffffffe380 R08: 0000555555568170 R09: 0000000000000000
>>>> [ +0.000003] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5c8
>>>> [ +0.000003] R13: 00005555555552a9 R14: 0000555555557d48 R15: 00007ffff7ffd040
>>>> [ +0.000007] </TASK>
>>>>
>>>> [ +0.000286] Allocated by task 425 on cpu 11 at 29.751192s:
>>>> [ +0.000013] kasan_save_stack+0x28/0x60
>>>> [ +0.000008] kasan_save_track+0x18/0x70
>>>> [ +0.000006] kasan_save_alloc_info+0x38/0x60
>>>> [ +0.000006] __kasan_kmalloc+0xc1/0xd0
>>>> [ +0.000005] __kmalloc_cache_noprof+0x1bd/0x430
>>>> [ +0.000006] amdgpu_driver_open_kms+0x172/0x760 [amdgpu]
>>>> [ +0.000521] drm_file_alloc+0x569/0x9a0
>>>> [ +0.000008] drm_client_init+0x1b7/0x410
>>>> [ +0.000007] drm_fbdev_client_setup+0x174/0x470
>>>> [ +0.000007] drm_client_setup+0x8a/0xf0
>>>> [ +0.000006] amdgpu_pci_probe+0x50b/0x10d0 [amdgpu]
>>>> [ +0.000482] local_pci_probe+0xe7/0x1b0
>>>> [ +0.000008] pci_device_probe+0x5bf/0x890
>>>> [ +0.000005] really_probe+0x1fd/0x950
>>>> [ +0.000007] __driver_probe_device+0x307/0x410
>>>> [ +0.000005] driver_probe_device+0x4e/0x150
>>>> [ +0.000006] __driver_attach+0x223/0x510
>>>> [ +0.000005] bus_for_each_dev+0x102/0x1a0
>>>> [ +0.000006] driver_attach+0x3d/0x60
>>>> [ +0.000005] bus_add_driver+0x309/0x650
>>>> [ +0.000005] driver_register+0x13d/0x490
>>>> [ +0.000006] __pci_register_driver+0x1ee/0x2b0
>>>> [ +0.000006] xfrm_ealg_get_byidx+0x43/0x50 [xfrm_algo]
>>>> [ +0.000008] do_one_initcall+0x9c/0x3e0
>>>> [ +0.000007] do_init_module+0x29e/0x7f0
>>>> [ +0.000006] load_module+0x5c75/0x7c80
>>>> [ +0.000006] init_module_from_file+0x106/0x180
>>>> [ +0.000007] idempotent_init_module+0x377/0x740
>>>> [ +0.000006] __x64_sys_finit_module+0xd7/0x180
>>>> [ +0.000006] x64_sys_call+0x1f0b/0x26f0
>>>> [ +0.000006] do_syscall_64+0x7c/0x170
>>>> [ +0.000005] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>
>>>> [ +0.000013] Freed by task 1737 on cpu 9 at 76.455063s:
>>>> [ +0.000010] kasan_save_stack+0x28/0x60
>>>> [ +0.000006] kasan_save_track+0x18/0x70
>>>> [ +0.000005] kasan_save_free_info+0x3b/0x60
>>>> [ +0.000006] __kasan_slab_free+0x54/0x80
>>>> [ +0.000005] kfree+0x127/0x470
>>>> [ +0.000006] amdgpu_driver_postclose_kms+0x455/0x760 [amdgpu]
>>>> [ +0.000485] drm_file_free.part.0+0x5b1/0xba0
>>>> [ +0.000007] drm_file_free+0x13/0x30
>>>> [ +0.000006] drm_client_release+0x1c4/0x2b0
>>>> [ +0.000006] drm_fbdev_ttm_fb_destroy+0xd2/0x120 [drm_ttm_helper]
>>>> [ +0.000007] put_fb_info+0x97/0xe0
>>>> [ +0.000006] unregister_framebuffer+0x197/0x380
>>>> [ +0.000005] drm_fb_helper_unregister_info+0x94/0x100
>>>> [ +0.000005] drm_fbdev_client_unregister+0x3c/0x80
>>>> [ +0.000007] drm_client_dev_unregister+0x144/0x330
>>>> [ +0.000006] drm_dev_unregister+0x49/0x1b0
>>>> [ +0.000006] drm_dev_unplug+0x4c/0xd0
>>>> [ +0.000006] amdgpu_pci_remove+0x58/0x130 [amdgpu]
>>>> [ +0.000482] pci_device_remove+0xae/0x1e0
>>>> [ +0.000006] device_remove+0xc7/0x180
>>>> [ +0.000006] device_release_driver_internal+0x3d4/0x5a0
>>>> [ +0.000007] device_release_driver+0x12/0x20
>>>> [ +0.000006] pci_stop_bus_device+0x104/0x150
>>>> [ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40
>>>> [ +0.000005] remove_store+0xd7/0xf0
>>>> [ +0.000007] dev_attr_store+0x3f/0x80
>>>> [ +0.000006] sysfs_kf_write+0x125/0x1d0
>>>> [ +0.000005] kernfs_fop_write_iter+0x2ea/0x490
>>>> [ +0.000007] vfs_write+0x90d/0xe70
>>>> [ +0.000006] ksys_write+0x119/0x220
>>>> [ +0.000006] __x64_sys_write+0x72/0xc0
>>>> [ +0.000006] x64_sys_call+0x18ab/0x26f0
>>>> [ +0.000005] do_syscall_64+0x7c/0x170
>>>> [ +0.000005] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>
>>>> [ +0.000013] The buggy address belongs to the object at ffff88811c068000
>>>> which belongs to the cache kmalloc-rnd-01-4k of size 4096
>>>> [ +0.000016] The buggy address is located 3168 bytes inside of
>>>> freed 4096-byte region [ffff88811c068000, ffff88811c069000)
>>>>
>>>> [ +0.000022] The buggy address belongs to the physical page:
>>>> [ +0.000010] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff88811c06e000 pfn:0x11c068
>>>> [ +0.000006] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>>>> [ +0.000006] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
>>>> [ +0.000007] page_type: f5(slab)
>>>> [ +0.000007] raw: 0017ffffc0000040 ffff88810004c140 dead000000000122 0000000000000000
>>>> [ +0.000005] raw: ffff88811c06e000 0000000080040002 00000000f5000000 0000000000000000
>>>> [ +0.000006] head: 0017ffffc0000040 ffff88810004c140 dead000000000122 0000000000000000
>>>> [ +0.000005] head: ffff88811c06e000 0000000080040002 00000000f5000000 0000000000000000
>>>> [ +0.000006] head: 0017ffffc0000003 ffffea0004701a01 ffffffffffffffff 0000000000000000
>>>> [ +0.000005] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
>>>> [ +0.000004] page dumped because: kasan: bad access detected
>>>>
>>>> [ +0.000011] Memory state around the buggy address:
>>>> [ +0.000009] ffff88811c068b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> [ +0.000012] ffff88811c068b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> [ +0.000011] >ffff88811c068c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> [ +0.000011] ^
>>>> [ +0.000010] ffff88811c068c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> [ +0.000011] ffff88811c068d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> [ +0.000011] ==================================================================
>>>>
>>>> Cc: Alex Deucher <alexander.deucher at amd.com>
>>>> Cc: Christian König <christian.koenig at amd.com>
>>>> Cc: Lijo Lazar <lijo.lazar at amd.com>
>>>> Cc: Jesse Zhang <Jesse.Zhang at amd.com>
>>>> Cc: Arvind Yadav <arvind.yadav at amd.com>
>>> Please add:
>>> Fixes: adba0929736a ("drm/amdgpu: Fix Illegal opcode in command stream Error")
>>>
>>> Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
>> Sorry, I was on vacation last week.
>>
>> Clear NAK to that here, this will massively break things!
>
> Could you clarify what issues are introduced by this patch? I validated it using NV31 user queues and didn’t observe any problems
Without this code when an application crashes drm_release() will first free up all memory and then disable the queues.
It is mandatory that we do it the other way around.
Regards,
Christian.
>
> Thanks, Vitaly
>
>> Regards,
>> Christian.
>>
>>
>>>> v2: drop amdgpu_drm_release() and assign drm_release()
>>>> as the callback directly.(Alex)
>>>>
>>>> Signed-off-by: Vitaly Prosyak <vitaly.prosyak at amd.com>
>>>> Change-Id: I7ab41fd785dcd6f2651c726337ca07519ceae574
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 +---------------
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 +++
>>>> 2 files changed, 4 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> index 7fd233f160bf..818a49e69aea 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> @@ -2912,20 +2912,6 @@ static int amdgpu_pmops_runtime_idle(struct device *dev)
>>>> return ret;
>>>> }
>>>>
>>>> -static int amdgpu_drm_release(struct inode *inode, struct file *filp)
>>>> -{
>>>> - struct drm_file *file_priv = filp->private_data;
>>>> - struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
>>>> -
>>>> - if (fpriv) {
>>>> - fpriv->evf_mgr.fd_closing = true;
>>>> - amdgpu_eviction_fence_destroy(&fpriv->evf_mgr);
>>>> - amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
>>>> - }
>>>> -
>>>> - return drm_release(inode, filp);
>>>> -}
>>>> -
>>>> long amdgpu_drm_ioctl(struct file *filp,
>>>> unsigned int cmd, unsigned long arg)
>>>> {
>>>> @@ -2977,7 +2963,7 @@ static const struct file_operations amdgpu_driver_kms_fops = {
>>>> .owner = THIS_MODULE,
>>>> .open = drm_open,
>>>> .flush = amdgpu_flush,
>>>> - .release = amdgpu_drm_release,
>>>> + .release = drm_release,
>>>> .unlocked_ioctl = amdgpu_drm_ioctl,
>>>> .mmap = drm_gem_mmap,
>>>> .poll = drm_poll,
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> index d2ce7d86dbc8..195ed81d39ff 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> @@ -1501,6 +1501,9 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>>>> amdgpu_vm_bo_del(adev, fpriv->prt_va);
>>>> amdgpu_bo_unreserve(pd);
>>>> }
>>>> + fpriv->evf_mgr.fd_closing = true;
>>>> + amdgpu_eviction_fence_destroy(&fpriv->evf_mgr);
>>>> + amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
>>>>
>>>> amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>>>> amdgpu_vm_fini(adev, &fpriv->vm);
>>>> --
>>>> 2.34.1
>>>>
More information about the amd-gfx
mailing list