[Intel-gfx] [PATCH v1 2/3] drm/i915/guc: Close deregister-context race against CT-loss

Teres Alexis, Alan Previn alan.previn.teres.alexis at intel.com
Thu Aug 10 03:39:10 UTC 2023


On Wed, 2023-08-02 at 16:35 -0700, Teres Alexis, Alan Previn wrote:
> If we are at the end of suspend or very early in resume
> its possible an async fence signal could lead us to the
> execution of the context destruction worker (after the
> prior worker flush).
> 
alan:snip
>  
>  static void __guc_context_destroy(struct intel_context *ce)
> @@ -3270,7 +3287,20 @@ static void deregister_destroyed_contexts(struct intel_guc *guc)
>  		if (!ce)
>  			break;
>  
> -		guc_lrc_desc_unpin(ce);
> +		if (guc_lrc_desc_unpin(ce)) {
> +			/*
> +			 * This means GuC's CT link severed mid-way which only happens
> +			 * in suspend-resume corner cases. In this case, put the
> +			 * context back into the destroyed_contexts list which will
> +			 * get picked up on the next context deregistration event or
> +			 * purged in a GuC sanitization event (reset/unload/wedged/...).
> +			 */
> +			spin_lock_irqsave(&guc->submission_state.lock, flags);
> +			list_add_tail(&ce->destroyed_link,
> +				      &guc->submission_state.destroyed_contexts);
alan: i completely missed the fact this new code is sitting within a 
while (!list_empty(&guc->submission_state.submission_state.destroyed_contexts) block
so putting it back will cause it to while loop forever.

will fix and rerev.

> +			spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> +		}
> +
>  	}
>  }
>  



More information about the Intel-gfx mailing list