[PATCH] drm/i915/guc: Handle race condition where wakeref count drops below 0

Jesus Narvaez jesus.narvaez at intel.com
Thu May 15 18:37:52 UTC 2025


There is a rare race condition when preparing for a reset where
guc_lrc_desc_unpin() could be in the process of deregistering a context
while a different thread is scrubbing outstanding contexts and it alters
the context state and does a wakeref put. Then, if there is a failure
with deregister_context(), a second wakeref put could occur. As a result
the wakeref count could drop below 0 and fail an INTEL_WAKEREF_BUG_ON()
check.

Therefore if there is a failure with deregister_context(), undo the
context state changes and do a wakeref put only if the context was set
to be destroyed earlier.

Fixes: 2f2cc53b5fe7 ("drm/i915/guc: Close deregister-context race against CT-loss")
Signed-off-by: Jesus Narvaez <jesus.narvaez at intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
Cc: Anshuman Gupta <anshuman.gupta at intel.com>
Cc: Mousumi Jana <mousumi.jana at intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
Cc: Matt Roper <matthew.d.roper at intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index f8cb7c630d5b..f066c2dd7114 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -3441,20 +3441,26 @@ static inline int guc_lrc_desc_unpin(struct intel_context *ce)
 
 	/*
 	 * GuC is active, lets destroy this context, but at this point we can still be racing
-	 * with suspend, so we undo everything if the H2G fails in deregister_context so
-	 * that GuC reset will find this context during clean up.
+	 * with suspend, so we undo everything if the H2G fails in deregister_context
+	 * and if the context was scheduled to be destroyed so that GuC reset will
+	 * find this context during clean up.
 	 */
 	ret = deregister_context(ce, ce->guc_id.id);
 	if (ret) {
+		bool pending_destroyed;
 		spin_lock_irqsave(&ce->guc_state.lock, flags);
-		set_context_registered(ce);
-		clr_context_destroyed(ce);
+		pending_destroyed = context_destroyed(ce);
+		if (pending_destroyed) {
+			set_context_registered(ce);
+			clr_context_destroyed(ce);
+		}
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 		/*
 		 * As gt-pm is awake at function entry, intel_wakeref_put_async merely decrements
 		 * the wakeref immediately but per function spec usage call this after unlock.
 		 */
-		intel_wakeref_put_async(&gt->wakeref);
+		if (pending_destroyed)
+			intel_wakeref_put_async(&gt->wakeref);
 	}
 
 	return ret;
-- 
2.34.1



More information about the Intel-gfx mailing list