[RFC 3/3] drm/amdgpu: Add locking to amdgpu_ctx_mgr_entity_fini()

Mon May 19 16:37:13 UTC 2025

Amdgpu_ctx_mgr_entity_fini() walks the context IDR unlocked so question is
could it in theory see a stale entry and attempt to destroy the
drm_sched_entity twice?

Problem is I have hit this on a KASAN enabled kernel only _once_ and never
since. In that case it reported amdgpu_ctx_ioctl() had freed the entity
already which would mean the question is could we possibly go through the
mutex_unlock() on one CPU, and another CPU to follow immediately with
file->release (DRM postclose) and see the stale entry.

Throwing it out there not to forget about it, since I have manage to
lose the KASAN trace already..

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
Cc: Alex Deucher <alexander.deucher at amd.com>
Cc: Christian König <christian.koenig at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 85567d0d9545..95b005ed839e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -927,6 +927,7 @@ static void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
 
 	idp = &mgr->ctx_handles;
 
+	mutex_lock(&mgr->lock);
 	idr_for_each_entry(idp, ctx, id) {
 		if (kref_read(&ctx->refcount) != 1) {
 			DRM_ERROR("ctx %p is still alive\n", ctx);
@@ -945,6 +946,7 @@ static void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
 			}
 		}
 	}
+	mutex_unlock(&mgr->lock);
 }
 
 void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr)
-- 
2.48.0