[RFC 3/3] drm/amdgpu: Add locking to amdgpu_ctx_mgr_entity_fini()
Christian König
christian.koenig at amd.com
Tue May 20 07:39:48 UTC 2025
On 5/19/25 18:37, Tvrtko Ursulin wrote:
> Amdgpu_ctx_mgr_entity_fini() walks the context IDR unlocked so question is
> could it in theory see a stale entry and attempt to destroy the
> drm_sched_entity twice?
No, when this function is called when the file descriptor is freed up.
So there should never ever be any concurrent user of this.
> Problem is I have hit this on a KASAN enabled kernel only _once_ and never
> since. In that case it reported amdgpu_ctx_ioctl() had freed the entity
> already which would mean the question is could we possibly go through the
> mutex_unlock() on one CPU, and another CPU to follow immediately with
> file->release (DRM postclose) and see the stale entry.
Mhm, it would basically mean that the file descriptor can be closed while some IOCTL on it is still ongoing.
I think that this would be extremely ugly and should never ever happen in the first place. Adding the mutex just band aids the issue, but not really fixes it.
> Throwing it out there not to forget about it, since I have manage to
> lose the KASAN trace already..
If you manage to trigger that again please send it to me ASAP.
Thanks,
Christian.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> Cc: Alex Deucher <alexander.deucher at amd.com>
> Cc: Christian König <christian.koenig at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 85567d0d9545..95b005ed839e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -927,6 +927,7 @@ static void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
>
> idp = &mgr->ctx_handles;
>
> + mutex_lock(&mgr->lock);
> idr_for_each_entry(idp, ctx, id) {
> if (kref_read(&ctx->refcount) != 1) {
> DRM_ERROR("ctx %p is still alive\n", ctx);
> @@ -945,6 +946,7 @@ static void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
> }
> }
> }
> + mutex_unlock(&mgr->lock);
> }
>
> void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr)
More information about the amd-gfx
mailing list