[PATCH] drm/amdgpu: Fix NULL dereference in amdgpu_userq_restore_worker
Christian König
christian.koenig at amd.com
Thu May 8 09:22:13 UTC 2025
On 5/8/25 07:04, Yadav, Arvind wrote:
>
> On 5/8/2025 12:36 AM, Alex Deucher wrote:
>> On Wed, May 7, 2025 at 2:38 PM Arvind Yadav <Arvind.Yadav at amd.com> wrote:
>>> Switch cancel_delayed_work() to cancel_delayed_work_sync() to ensure
>>> the delayed work has finished executing before proceeding with
>>> resource cleanup. This prevents a potential use-after-free or
>>> NULL dereference if the resume_work is still running during finalization.
>> There are several other places with similar patterns that look
>> suspect. E.g., amdgpu_userq_destroy() and amdgpu_userq_evict().
> Noted, I will do the changes.
Also keep an eye open for lockdep error, e.g. compile the kernel with lockdep enabled and make sure that the code flow is executed at least once.
Apart from that good catch,
Christian.
> ~arvind
>> Alex
>>
>>> BUG: kernel NULL pointer dereference, address: 0000000000000140
>>> [ +0.000050] #PF: supervisor read access in kernel mode
>>> [ +0.000019] #PF: error_code(0x0000) - not-present page
>>> [ +0.000021] PGD 0 P4D 0
>>> [ +0.000015] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
>>> [ +0.000021] CPU: 17 UID: 0 PID: 196299 Comm: kworker/17:0 Tainted: G U 6.14.0-org-staging #1
>>> [ +0.000032] Tainted: [U]=USER
>>> [ +0.000015] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F39 03/22/2024
>>> [ +0.000029] Workqueue: events amdgpu_userq_restore_worker [amdgpu]
>>> [ +0.000426] RIP: 0010:drm_exec_lock_obj+0x32/0x210 [drm_exec]
>>> [ +0.000025] Code: e5 41 57 41 56 41 55 49 89 f5 41 54 49 89 fc 48 83 ec 08 4c 8b 77 30 4d 85 f6 0f 85 c0 00 00 00 4c 8d 7f 08 48 39 77 38 74 54 <49> 8b bd f8 00 00 00 4c 89 fe 41 f6 04 24 01 75 3c e8 08 50 bc e0
>>> [ +0.000046] RSP: 0018:ffffab1b04da3ce8 EFLAGS: 00010297
>>> [ +0.000020] RAX: 0000000000000001 RBX: ffff930cc60e4bc0 RCX: 0000000000000000
>>> [ +0.000025] RDX: 0000000000000004 RSI: 0000000000000048 RDI: ffffab1b04da3d88
>>> [ +0.000028] RBP: ffffab1b04da3d10 R08: ffff930cc60e4000 R09: 0000000000000000
>>> [ +0.000022] R10: ffffab1b04da3d18 R11: 0000000000000001 R12: ffffab1b04da3d88
>>> [ +0.000023] R13: 0000000000000048 R14: 0000000000000000 R15: ffffab1b04da3d90
>>> [ +0.000023] FS: 0000000000000000(0000) GS:ffff9313dea80000(0000) knlGS:0000000000000000
>>> [ +0.000024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ +0.000021] CR2: 0000000000000140 CR3: 000000018351a000 CR4: 0000000000350ef0
>>> [ +0.000025] Call Trace:
>>> [ +0.000018] <TASK>
>>> [ +0.000015] ? show_regs+0x69/0x80
>>> [ +0.000022] ? __die+0x25/0x70
>>> [ +0.000019] ? page_fault_oops+0x15d/0x510
>>> [ +0.000024] ? do_user_addr_fault+0x312/0x690
>>> [ +0.000024] ? sched_clock_cpu+0x10/0x1a0
>>> [ +0.000028] ? exc_page_fault+0x78/0x1b0
>>> [ +0.000025] ? asm_exc_page_fault+0x27/0x30
>>> [ +0.000024] ? drm_exec_lock_obj+0x32/0x210 [drm_exec]
>>> [ +0.000024] drm_exec_prepare_obj+0x21/0x60 [drm_exec]
>>> [ +0.000021] amdgpu_vm_lock_pd+0x22/0x30 [amdgpu]
>>> [ +0.000266] amdgpu_userq_validate_bos+0x6c/0x320 [amdgpu]
>>> [ +0.000333] amdgpu_userq_restore_worker+0x4a/0x120 [amdgpu]
>>> [ +0.000316] process_one_work+0x189/0x3c0
>>> [ +0.000021] worker_thread+0x2a4/0x3b0
>>> [ +0.000022] kthread+0x109/0x220
>>> [ +0.000018] ? __pfx_worker_thread+0x10/0x10
>>> [ +0.000779] ? _raw_spin_unlock_irq+0x1f/0x40
>>> [ +0.000560] ? __pfx_kthread+0x10/0x10
>>> [ +0.000543] ret_from_fork+0x3c/0x60
>>> [ +0.000507] ? __pfx_kthread+0x10/0x10
>>> [ +0.000515] ret_from_fork_asm+0x1a/0x30
>>> [ +0.000515] </TASK>
>>>
>>> Cc: Alex Deucher <alexander.deucher at amd.com>
>>> Cc: Christian König <christian.koenig at amd.com>
>>> Cc: Sunil Khatri <sunil.khatri at amd.com>
>>> Signed-off-by: Arvind Yadav <arvind.yadav at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> index afbe01149ed3..711e190a6a82 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>>> @@ -774,7 +774,7 @@ void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
>>> struct amdgpu_userq_mgr *uqm, *tmp;
>>> uint32_t queue_id;
>>>
>>> - cancel_delayed_work(&userq_mgr->resume_work);
>>> + cancel_delayed_work_sync(&userq_mgr->resume_work);
>>>
>>> mutex_lock(&userq_mgr->userq_mutex);
>>> idr_for_each_entry(&userq_mgr->userq_idr, queue, queue_id) {
>>> --
>>> 2.34.1
>>>
More information about the amd-gfx
mailing list