[PATCH v1 2/3] drm/i915/guc: Close deregister-context race against CT-loss
Teres Alexis, Alan Previn
alan.previn.teres.alexis at intel.com
Thu Aug 10 03:39:10 UTC 2023
On Wed, 2023-08-02 at 16:35 -0700, Teres Alexis, Alan Previn wrote:
> If we are at the end of suspend or very early in resume
> its possible an async fence signal could lead us to the
> execution of the context destruction worker (after the
> prior worker flush).
>
alan:snip
>
> static void __guc_context_destroy(struct intel_context *ce)
> @@ -3270,7 +3287,20 @@ static void deregister_destroyed_contexts(struct intel_guc *guc)
> if (!ce)
> break;
>
> - guc_lrc_desc_unpin(ce);
> + if (guc_lrc_desc_unpin(ce)) {
> + /*
> + * This means GuC's CT link severed mid-way which only happens
> + * in suspend-resume corner cases. In this case, put the
> + * context back into the destroyed_contexts list which will
> + * get picked up on the next context deregistration event or
> + * purged in a GuC sanitization event (reset/unload/wedged/...).
> + */
> + spin_lock_irqsave(&guc->submission_state.lock, flags);
> + list_add_tail(&ce->destroyed_link,
> + &guc->submission_state.destroyed_contexts);
alan: i completely missed the fact this new code is sitting within a
while (!list_empty(&guc->submission_state.submission_state.destroyed_contexts) block
so putting it back will cause it to while loop forever.
will fix and rerev.
> + spin_unlock_irqrestore(&guc->submission_state.lock, flags);
> + }
> +
> }
> }
>
More information about the dri-devel
mailing list