[PATCH] drm/amdkfd: Fix kfd process ref leaking when userptr unmapping

Philip Yang yangp at amd.com
Wed May 28 21:22:31 UTC 2025


On 2025-05-28 14:24, Kasiviswanathan, Harish wrote:
> [Public]
>
>  From the code, it looks like you want to hold reference to the process to ensure that it doesn't get destroyed while sending the fault event to user. If that is correct, then your commit message is not reflecting that.
>
> With commit message updated, this patch is Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan at amd.com>

calls kfd_lookup_process_by_pid() as function parameter increase/hold 
kfd process ref, but don't decease the refcount, then kfd process 
refcount will never becomes 0, the kfd process left to hold/leak 
resource after application exit. NULL pointer check is also necessary as 
kfd_lookup_process_by_pid() may return NULL pointer if app process/task 
is already destroyed.

Regards,

Philip

>
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Philip Yang
> Sent: Tuesday, May 27, 2025 11:35 AM
> To: amd-gfx at lists.freedesktop.org
> Cc: Kuehling, Felix <Felix.Kuehling at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Yang, Philip <Philip.Yang at amd.com>
> Subject: [PATCH] drm/amdkfd: Fix kfd process ref leaking when userptr unmapping
>
> kfd_lookup_process_by_pid increases process ref, the refcount is
> leaking.
>
> Fixes: 7a566d7f56f4 ("amd/amdkfd: Trigger segfault for early userptr unmmapping")
> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 260165bbe373..99a32bc9bb89 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -2563,12 +2563,17 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
>                           * from the KFD, trigger a segmentation fault in VM debug mode.
>                           */
>                          if (amdgpu_ttm_adev(bo->tbo.bdev)->debug_vm_userptr) {
> +                               struct kfd_process *p;
> +
>                                  pr_err("Pid %d unmapped memory before destroying userptr at GPU addr 0x%llx\n",
>                                                                  pid_nr(process_info->pid), mem->va);
>
>                                  // Send GPU VM fault to user space
> -                               kfd_signal_vm_fault_event_with_userptr(kfd_lookup_process_by_pid(process_info->pid),
> -                                                               mem->va);
> +                               p = kfd_lookup_process_by_pid(process_info->pid);
> +                               if (p) {
> +                                       kfd_signal_vm_fault_event_with_userptr(p, mem->va);
> +                                       kfd_unref_process(p);
> +                               }
>                          }
>
>                          ret = 0;
> --
> 2.49.0
>


More information about the amd-gfx mailing list