[PATCH 1/1] drm/i915/active: Stop using active barriers as fence trackers

Janusz Krzysztofik janusz.krzysztofik at linux.intel.com
Thu Feb 2 19:40:53 UTC 2023


When adding a request to a composite tracker, we try to use an existing
fence tracker already registered with that composite tracker.  The fence
tracker we obtain can already track another fence, can be an idle barrier,
or an active barrier.  The first two cases are easy to handle and we seem
to do that correctly.  In the last case, we attempt to replace the active
barrier with our request.  However, when trying to delete that barrier
from a list of active barriers it belongs to, we ignore return value from
that operation, which informs us whether the deletion succeeded or not.
As a consequence, we may end up reusing a tracker that can been still
processed by another thread as a barrier.  Since the same field of the
tracker structure is used as a list node with both active barriers lists
and request's callbacks lists, our request's callbacks list we add the
tracker to can then get corrupted mysteriously.

This patch doesn't fix the root cause of failed deletions of active
barriers form their lists, only prevents from taking over such still
active barriers, triggering a bug if that happens, more informative than
unexplained list corruptions.  It is intended for inclusion in a series,
right after a patch that will fix the root cause of failed barrier
removals, just in case we break something with barrier list processing in
the future, so we won't get those mysterious list corruptions again but
rather a clear indication of what's going wrong.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik at linux.intel.com>
---
 drivers/gpu/drm/i915/i915_active.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 7412abf166a8c..01c47f1c796df 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -421,9 +421,10 @@ replace_barrier(struct i915_active *ref, struct i915_active_fence *active)
 	 * This request is on the kernel_context timeline, and so
 	 * we can use it to substitute for the pending idle-barrer
 	 * request that we want to emit on the kernel_context.
+	 * Our users want to know if our attempt to delete that
+	 * tracker from the list of engine's barrier tasks succeeded.
 	 */
-	__active_del_barrier(ref, node_from_active(active));
-	return true;
+	return __active_del_barrier(ref, node_from_active(active));
 }
 
 int i915_active_add_request(struct i915_active *ref, struct i915_request *rq)
@@ -447,6 +448,8 @@ int i915_active_add_request(struct i915_active *ref, struct i915_request *rq)
 		RCU_INIT_POINTER(active->fence, NULL);
 		atomic_dec(&ref->count);
 	}
+	/* The tracker must no longer be a barrier before we can use it */
+	GEM_BUG_ON(is_barrier(active));
 	if (!__i915_active_fence_set(active, fence))
 		__i915_active_acquire(ref);
 
-- 
2.25.1



More information about the Intel-gfx-trybot mailing list