[PATCH] drm/amdkfd: Fix an eviction fence leak
Felix Kuehling
felix.kuehling at amd.com
Fri Sep 27 19:48:29 UTC 2024
On 2024-09-27 06:36, Lang Yu wrote:
> dma_fence_get/put() should be called balanced in
> init_kfd_vm() and amdgpu_amdkfd_gpuvm_destroy_cb().
I don't think that's correct. The reference taken in init_kfd_vm is
returned to the caller of amdgpu_amdkfd_gpuvm_acquire_process_vm, which
gets stored in the kfd_process structure. I think it's that caller's
responsibility to drop their reference. I think the real problem is,
that we're creating a new reference for each VM, but the kfd_process
structure is only one per process. So the RCU_INIT_POINTER(p->ef, ef);
in kfd_process_device_init_vm leaks the previous references.
Since we only need to get the eviction fence reference when creating the
first VM, I suggest this fix in kfd_process_device_init_vm:
ret = amdgpu_amdkfd_gpuvm_acquire_process_vm(dev->adev, avm,
&p->kgd_process_info,
- &ef);
+ p->ef ? NULL : &ef);
And in init_kfd_vm:
if (ef)
- *ef = dma_fence_get(&vm->process_info->eviction_fence->base);
+ *ef = dma_fence_get(&vm->process_info->eviction_fence->base);
Regards,
Felix
>
> Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
>
> Signed-off-by: Lang Yu <lang.yu at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index ce5ca304dba9..c3a4f8d297f7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1586,6 +1586,7 @@ void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
>
> /* Update process info */
> mutex_lock(&process_info->lock);
> + dma_fence_put(&process_info->eviction_fence->base);
> process_info->n_vms--;
> list_del(&vm->vm_list_node);
> mutex_unlock(&process_info->lock);
> @@ -1598,7 +1599,6 @@ void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
> WARN_ON(!list_empty(&process_info->userptr_valid_list));
> WARN_ON(!list_empty(&process_info->userptr_inval_list));
>
> - dma_fence_put(&process_info->eviction_fence->base);
> cancel_delayed_work_sync(&process_info->restore_userptr_work);
> put_pid(process_info->pid);
> mutex_destroy(&process_info->lock);
More information about the amd-gfx
mailing list