[RFC 3/3] drm/amdgpu: Add locking to amdgpu_ctx_mgr_entity_fini()

Christian König christian.koenig at amd.com
Tue May 20 07:39:48 UTC 2025


On 5/19/25 18:37, Tvrtko Ursulin wrote:
> Amdgpu_ctx_mgr_entity_fini() walks the context IDR unlocked so question is
> could it in theory see a stale entry and attempt to destroy the
> drm_sched_entity twice?

No, when this function is called when the file descriptor is freed up.

So there should never ever be any concurrent user of this.

> Problem is I have hit this on a KASAN enabled kernel only _once_ and never
> since. In that case it reported amdgpu_ctx_ioctl() had freed the entity
> already which would mean the question is could we possibly go through the
> mutex_unlock() on one CPU, and another CPU to follow immediately with
> file->release (DRM postclose) and see the stale entry.

Mhm, it would basically mean that the file descriptor can be closed while some IOCTL on it is still ongoing.

I think that this would be extremely ugly and should never ever happen in the first place. Adding the mutex just band aids the issue, but not really fixes it.

> Throwing it out there not to forget about it, since I have manage to
> lose the KASAN trace already..

If you manage to trigger that again please send it to me ASAP.

Thanks,
Christian.

> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> Cc: Alex Deucher <alexander.deucher at amd.com>
> Cc: Christian König <christian.koenig at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 85567d0d9545..95b005ed839e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -927,6 +927,7 @@ static void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
>  
>  	idp = &mgr->ctx_handles;
>  
> +	mutex_lock(&mgr->lock);
>  	idr_for_each_entry(idp, ctx, id) {
>  		if (kref_read(&ctx->refcount) != 1) {
>  			DRM_ERROR("ctx %p is still alive\n", ctx);
> @@ -945,6 +946,7 @@ static void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
>  			}
>  		}
>  	}
> +	mutex_unlock(&mgr->lock);
>  }
>  
>  void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr)



More information about the amd-gfx mailing list