[PATCH v2] drm/amdgpu: Fix NULL dereference in amdgpu_userq_restore_worker

Thu May 8 09:33:40 UTC 2025

Reviewed-by: Christian König <christian.koenig at amd.com>

On 5/8/25 09:03, Khatri, Sunil wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
> Reviewed-by: Sunil Khatri <sunil.khatri at amd.com>
> 
> -----Original Message-----
> From: Yadav, Arvind <Arvind.Yadav at amd.com>
> Sent: Thursday, May 8, 2025 11:06 AM
> To: Koenig, Christian <Christian.Koenig at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Khatri, Sunil <Sunil.Khatri at amd.com>
> Cc: amd-gfx at lists.freedesktop.org; Yadav, Arvind <Arvind.Yadav at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Yadav, Arvind <Arvind.Yadav at amd.com>
> Subject: [PATCH v2] drm/amdgpu: Fix NULL dereference in amdgpu_userq_restore_worker
> 
> Switch cancel_delayed_work() to cancel_delayed_work_sync() to ensure the delayed work has finished executing before proceeding with resource cleanup. This prevents a potential use-after-free or NULL dereference if the resume_work is still running during finalization.
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000140 [  +0.000050] #PF: supervisor read access in kernel mode [  +0.000019] #PF: error_code(0x0000) - not-present page [  +0.000021] PGD 0 P4D 0 [  +0.000015] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  +0.000021] CPU: 17 UID: 0 PID: 196299 Comm: kworker/17:0 Tainted: G     U             6.14.0-org-staging #1
> [  +0.000032] Tainted: [U]=USER
> [  +0.000015] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F39 03/22/2024 [  +0.000029] Workqueue: events amdgpu_userq_restore_worker [amdgpu] [  +0.000426] RIP: 0010:drm_exec_lock_obj+0x32/0x210 [drm_exec] [  +0.000025] Code: e5 41 57 41 56 41 55 49 89 f5 41 54 49 89 fc 48 83 ec 08 4c 8b 77 30 4d 85 f6 0f 85 c0 00 00 00 4c 8d 7f 08 48 39 77 38 74 54 <49> 8b bd f8 00 00 00 4c 89 fe 41 f6 04 24 01 75 3c e8 08 50 bc e0 [  +0.000046] RSP: 0018:ffffab1b04da3ce8 EFLAGS: 00010297 [  +0.000020] RAX: 0000000000000001 RBX: ffff930cc60e4bc0 RCX: 0000000000000000 [  +0.000025] RDX: 0000000000000004 RSI: 0000000000000048 RDI: ffffab1b04da3d88 [  +0.000028] RBP: ffffab1b04da3d10 R08: ffff930cc60e4000 R09: 0000000000000000 [  +0.000022] R10: ffffab1b04da3d18 R11: 0000000000000001 R12: ffffab1b04da3d88 [  +0.000023] R13: 0000000000000048 R14: 0000000000000000 R15: ffffab1b04da3d90 [  +0.000023] FS:  0000000000000000(0000) GS:ffff9313dea80000(0000) knlGS:0000000000000000 [  +0.000024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [  +0.000021] CR2: 0000000000000140 CR3: 000000018351a000 CR4: 0000000000350ef0 [  +0.000025] Call Trace:
> [  +0.000018]  <TASK>
> [  +0.000015]  ? show_regs+0x69/0x80
> [  +0.000022]  ? __die+0x25/0x70
> [  +0.000019]  ? page_fault_oops+0x15d/0x510 [  +0.000024]  ? do_user_addr_fault+0x312/0x690 [  +0.000024]  ? sched_clock_cpu+0x10/0x1a0 [  +0.000028]  ? exc_page_fault+0x78/0x1b0 [  +0.000025]  ? asm_exc_page_fault+0x27/0x30 [  +0.000024]  ? drm_exec_lock_obj+0x32/0x210 [drm_exec] [  +0.000024]  drm_exec_prepare_obj+0x21/0x60 [drm_exec] [  +0.000021]  amdgpu_vm_lock_pd+0x22/0x30 [amdgpu] [  +0.000266]  amdgpu_userq_validate_bos+0x6c/0x320 [amdgpu] [  +0.000333]  amdgpu_userq_restore_worker+0x4a/0x120 [amdgpu] [  +0.000316]  process_one_work+0x189/0x3c0 [  +0.000021]  worker_thread+0x2a4/0x3b0 [  +0.000022]  kthread+0x109/0x220 [  +0.000018]  ? __pfx_worker_thread+0x10/0x10 [  +0.000779]  ? _raw_spin_unlock_irq+0x1f/0x40 [  +0.000560]  ? __pfx_kthread+0x10/0x10 [  +0.000543]  ret_from_fork+0x3c/0x60 [  +0.000507]  ? __pfx_kthread+0x10/0x10 [  +0.000515]  ret_from_fork_asm+0x1a/0x30 [  +0.000515]  </TASK>
> 
> v2: Replace cancel_delayed_work() to cancel_delayed_work_sync()
>     in amdgpu_userq_destroy() and amdgpu_userq_evict().
> 
> Cc: Alex Deucher <alexander.deucher at amd.com>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Sunil Khatri <sunil.khatri at amd.com>
> Signed-off-by: Arvind Yadav <arvind.yadav at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> index afbe01149ed3..c7c927db24ab 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
> @@ -300,7 +300,7 @@ amdgpu_userq_destroy(struct drm_file *filp, int queue_id)
>         struct amdgpu_usermode_queue *queue;
>         int r = 0;
> 
> -       cancel_delayed_work(&uq_mgr->resume_work);
> +       cancel_delayed_work_sync(&uq_mgr->resume_work);
>         mutex_lock(&uq_mgr->userq_mutex);
> 
>         queue = amdgpu_userq_find(uq_mgr, queue_id); @@ -745,7 +745,7 @@ amdgpu_userq_evict(struct amdgpu_userq_mgr *uq_mgr,
>         amdgpu_eviction_fence_signal(evf_mgr, ev_fence);
> 
>         if (evf_mgr->fd_closing) {
> -               cancel_delayed_work(&uq_mgr->resume_work);
> +               cancel_delayed_work_sync(&uq_mgr->resume_work);
>                 return;
>         }
> 
> @@ -774,7 +774,7 @@ void amdgpu_userq_mgr_fini(struct amdgpu_userq_mgr *userq_mgr)
>         struct amdgpu_userq_mgr *uqm, *tmp;
>         uint32_t queue_id;
> 
> -       cancel_delayed_work(&userq_mgr->resume_work);
> +       cancel_delayed_work_sync(&userq_mgr->resume_work);
> 
>         mutex_lock(&userq_mgr->userq_mutex);
>         idr_for_each_entry(&userq_mgr->userq_idr, queue, queue_id) {
> --
> 2.34.1
>