[PATCH 1/1] drm/amdgpu: Fix double release KFD pasid
Christian König
ckoenig.leichtzumerken at gmail.com
Tue Dec 13 15:57:05 UTC 2022
Am 13.12.22 um 16:49 schrieb Philip Yang:
> If amdgpu_amdkfd_gpuvm_acquire_process_vm returns failed after vm is
> converted to KFD vm and vm->pasid set to KFD pasid, KFD will not
> take pdd->drm_file reference, as a result, drm close file handler maybe
> called to release the KFD pasid before KFD process destroy to release
> the same pasid and set vm->pasid to zero, this generates below WARNING
> backtrace and NULL pointer access.
Well NAK. If you fail after making the VM a compute VM the correct
approach would be to drop this in the error handling again.
Since we don't need to reallocate anything that should also never fail.
Christian.
>
> For compute process, KFD manage pasid and drm close file handler should
> not release KFD pasid to avoid double release.
>
> amdgpu: Failed to create process VM object
>
> ida_free called for id=32770 which is not allocated.
> WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140
> RIP: 0010:ida_free+0x96/0x140
> Call Trace:
> amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
> amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
> drm_file_free.part.13+0x216/0x270 [drm]
> drm_close_helper.isra.14+0x60/0x70 [drm]
> drm_release+0x6e/0xf0 [drm]
> __fput+0xcc/0x280
> ____fput+0xe/0x20
> task_work_run+0x96/0xc0
> do_exit+0x3d0/0xc10
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> RIP: 0010:ida_free+0x76/0x140
> Call Trace:
> amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
> amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
> drm_file_free.part.13+0x216/0x270 [drm]
> drm_close_helper.isra.14+0x60/0x70 [drm]
> drm_release+0x6e/0xf0 [drm]
> __fput+0xcc/0x280
> ____fput+0xe/0x20
> task_work_run+0x96/0xc0
> do_exit+0x3d0/0xc10
>
> Suggested-by: Felix Kuehling <Felix.Kuehling at amd.com>
> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index efc0a13e9aea..bf444c3f6656 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1244,8 +1244,14 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
> amdgpu_bo_unreserve(adev->virt.csa_obj);
> }
>
> - pasid = fpriv->vm.pasid;
> + if (fpriv->vm.is_compute_context)
> + /* pasid managed by KFD is released when process is destroyed */
> + pasid = 0;
> + else
> + pasid = fpriv->vm.pasid;
> +
> pd = amdgpu_bo_ref(fpriv->vm.root.bo);
> +
> if (!WARN_ON(amdgpu_bo_reserve(pd, true))) {
> amdgpu_vm_bo_del(adev, fpriv->prt_va);
> amdgpu_bo_unreserve(pd);
More information about the amd-gfx
mailing list