[PATCH 1/1] drm/amdgpu: Fix double release KFD pasid

Christian König ckoenig.leichtzumerken at gmail.com
Tue Dec 13 15:57:05 UTC 2022


Am 13.12.22 um 16:49 schrieb Philip Yang:
> If amdgpu_amdkfd_gpuvm_acquire_process_vm returns failed after vm is
> converted to KFD vm and vm->pasid set to KFD pasid, KFD will not
> take pdd->drm_file reference, as a result, drm close file handler maybe
> called to release the KFD pasid before KFD process destroy to release
> the same pasid and set vm->pasid to zero, this generates below WARNING
> backtrace and NULL pointer access.

Well NAK. If you fail after making the VM a compute VM the correct 
approach would be to drop this in the error handling again.

Since we don't need to reallocate anything that should also never fail.

Christian.

>
> For compute process, KFD manage pasid and drm close file handler should
> not release KFD pasid to avoid double release.
>
>   amdgpu: Failed to create process VM object
>
>   ida_free called for id=32770 which is not allocated.
>   WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140
>   RIP: 0010:ida_free+0x96/0x140
>   Call Trace:
>    amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
>    amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
>    drm_file_free.part.13+0x216/0x270 [drm]
>    drm_close_helper.isra.14+0x60/0x70 [drm]
>    drm_release+0x6e/0xf0 [drm]
>    __fput+0xcc/0x280
>    ____fput+0xe/0x20
>    task_work_run+0x96/0xc0
>    do_exit+0x3d0/0xc10
>
>   BUG: kernel NULL pointer dereference, address: 0000000000000000
>   RIP: 0010:ida_free+0x76/0x140
>   Call Trace:
>    amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
>    amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
>    drm_file_free.part.13+0x216/0x270 [drm]
>    drm_close_helper.isra.14+0x60/0x70 [drm]
>    drm_release+0x6e/0xf0 [drm]
>    __fput+0xcc/0x280
>    ____fput+0xe/0x20
>    task_work_run+0x96/0xc0
>    do_exit+0x3d0/0xc10
>
> Suggested-by: Felix Kuehling <Felix.Kuehling at amd.com>
> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index efc0a13e9aea..bf444c3f6656 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1244,8 +1244,14 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>   		amdgpu_bo_unreserve(adev->virt.csa_obj);
>   	}
>   
> -	pasid = fpriv->vm.pasid;
> +	if (fpriv->vm.is_compute_context)
> +		/* pasid managed by KFD is released when process is destroyed */
> +		pasid = 0;
> +	else
> +		pasid = fpriv->vm.pasid;
> +
>   	pd = amdgpu_bo_ref(fpriv->vm.root.bo);
> +
>   	if (!WARN_ON(amdgpu_bo_reserve(pd, true))) {
>   		amdgpu_vm_bo_del(adev, fpriv->prt_va);
>   		amdgpu_bo_unreserve(pd);



More information about the amd-gfx mailing list