[PATCH 2/2] drm/i915/active: Fix misuse of non-idle barriers

Janusz Krzysztofik janusz.krzysztofik at linux.intel.com
Fri Feb 3 17:01:13 UTC 2023


When adding a request to a composite tracker, we try to use an existing
fence tracker already registered with that composite tracker.  The fence
tracker we obtain can already track another fence, can be an idle barrier,
or an active barrier.  The first two cases are easy to handle and we seem
to do that correctly.  In the last case, we attempt to replace the active
barrier with our request.  However, when trying to delete that barrier
from a list of active barriers it belongs to, we ignore return value from
that operation, which informs us whether the deletion succeeded or not.
As a consequence, we may end up reusing a tracker that can been still
processed by another thread as a barrier.  Since the same field of the
tracker structure is used as a list node with both active barriers lists
and request's callbacks lists, our request's callbacks list we add the
tracker to can then get corrupted mysteriously.

Respect results of barrier deletion attempts and trigger an oops if the
processed tracker occurs still a non-idle barrier.  While this should no
longer happen after the root cause of barrier deletion failures has been
fixed with the preceding patch, this fix can occur useful in case we ever
break something with barrier processing, so we never get those mysterious
list corruptions again but rather a clear indication of what's gone wrong.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/6333
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
---
 drivers/gpu/drm/i915/i915_active.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 320840eea30ce..0d0eda9e22b9e 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -424,9 +424,10 @@ replace_barrier(struct i915_active *ref, struct i915_active_fence *active)
 	 * This request is on the kernel_context timeline, and so
 	 * we can use it to substitute for the pending idle-barrer
 	 * request that we want to emit on the kernel_context.
+	 * Our users want to know if our attempt to delete that
+	 * tracker from the list of engine's barrier tasks succeeded.
 	 */
-	__active_del_barrier(ref, node_from_active(active));
-	return true;
+	return __active_del_barrier(ref, node_from_active(active));
 }
 
 int i915_active_add_request(struct i915_active *ref, struct i915_request *rq)
@@ -450,6 +451,8 @@ int i915_active_add_request(struct i915_active *ref, struct i915_request *rq)
 		RCU_INIT_POINTER(active->fence, NULL);
 		atomic_dec(&ref->count);
 	}
+	/* The tracker must no longer be a barrier before we can use it */
+	GEM_BUG_ON(is_barrier(active));
 	if (!__i915_active_fence_set(active, fence))
 		__i915_active_acquire(ref);
 
-- 
2.25.1



More information about the Intel-gfx-trybot mailing list