[PATCH] drm/i915/guc: Fix the fix for reset lock confusion
Andi Shyti
andi.shyti at kernel.org
Wed Apr 3 21:50:00 UTC 2024
Hi John,
On Fri, Mar 29, 2024 at 04:53:05PM -0700, John.C.Harrison at Intel.com wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
>
> The previous fix for the circlular lock splat about the busyness
> worker wasn't quite complete. Even though the reset-in-progress flag
> is cleared at the start of intel_uc_reset_finish, the entire function
> is still inside the reset mutex lock. Not sure why the patch appeared
> to fix the issue both locally and in CI. However, it is now back
> again.
>
> There is a further complication the wedge code path within
> intel_gt_reset() jumps around so much it results in nested
> reset_prepare/_finish calls. That is, the call sequence is:
> intel_gt_reset
> | reset_prepare
> | __intel_gt_set_wedged
> | | reset_prepare
> | | reset_finish
> | reset_finish
>
> The nested finish means that even if the clear of the in-progress flag
> was moved to the end of _finish, it would still be clear for the
> entire second call. Surprisingly, this does not seem to be causing any
> other problems at present.
>
> As an aside, a wedge on fini does not call the finish functions at
> all. The reset_in_progress flag is left set (twice).
>
> So instead of trying to cancel the worker anywhere at all in the reset
> path, just add a cancel to intel_guc_submission_fini instead. Note
> that it is not a problem if the worker is still active during a reset.
> Either it will run before the reset path starts locking things and
> will simply block the reset code for a tiny amount of time. Or it will
> run after the locks have been acquired and will early exit due to the
> try-lock.
>
> Also, do not use the reset-in-progress flag to decide whether a
> synchronous cancel is safe (from a lockdep perspective) or not.
> Instead, use the actual reset mutex state (both the genuine one and
> the custom rolled BACKOFF one).
>
> Fixes: 0e00a8814eec ("drm/i915/guc: Avoid circular locking issue on busyness flush")
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> Cc: Zhanjun Dong <zhanjun.dong at intel.com>
> Cc: John Harrison <John.C.Harrison at Intel.com>
> Cc: Andi Shyti <andi.shyti at linux.intel.com>
> Cc: Daniel Vetter <daniel at ffwll.ch>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
> Cc: Nirmoy Das <nirmoy.das at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
> Cc: Andrzej Hajda <andrzej.hajda at intel.com>
> Cc: Matt Roper <matthew.d.roper at intel.com>
> Cc: Jonathan Cavitt <jonathan.cavitt at intel.com>
> Cc: Prathap Kumar Valsan <prathap.kumar.valsan at intel.com>
> Cc: Alan Previn <alan.previn.teres.alexis at intel.com>
> Cc: Madhumitha Tolakanahalli Pradeep <madhumitha.tolakanahalli.pradeep at intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Cc: Ashutosh Dixit <ashutosh.dixit at intel.com>
> Cc: Dnyaneshwar Bhadane <dnyaneshwar.bhadane at intel.com>
Looks good:
Reviewed-by: Andi Shyti <andi.shyti at linux.intel.com>
Thanks,
Andi
More information about the Intel-gfx
mailing list