[PATCH] drm/i915/guc: Avoid circular locking issue on busyness flush

Andi Shyti andi.shyti at linux.intel.com
Wed Dec 20 23:25:28 UTC 2023


Hi John,

On Tue, Dec 19, 2023 at 11:59:57AM -0800, John.C.Harrison at Intel.com wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
> 
> Avoid the following lockdep complaint:
> <4> [298.856498] ======================================================
> <4> [298.856500] WARNING: possible circular locking dependency detected
> <4> [298.856503] 6.7.0-rc5-CI_DRM_14017-g58ac4ffc75b6+ #1 Tainted: G
>     N
> <4> [298.856505] ------------------------------------------------------
> <4> [298.856507] kworker/4:1H/190 is trying to acquire lock:
> <4> [298.856509] ffff8881103e9978 (&gt->reset.backoff_srcu){++++}-{0:0}, at:
> _intel_gt_reset_lock+0x35/0x380 [i915]
> <4> [298.856661]
> but task is already holding lock:
> <4> [298.856663] ffffc900013f7e58
> ((work_completion)(&(&guc->timestamp.work)->work)){+.+.}-{0:0}, at:
> process_scheduled_works+0x264/0x530
> <4> [298.856671]
> which lock already depends on the new lock.
> 
> The complaint is not actually valid. The busyness worker thread does
> indeed hold the worker lock and then attempt to acquire the reset lock
> (which may have happened in reverse order elsewhere). However, it does
> so with a trylock that exits if the reset lock is not available
> (specifically to prevent this and other similar deadlocks).
> Unfortunately, lockdep does not understand the trylock semantics (the
> lock is an i915 specific custom implementation for resets).
> 
> Not doing a synchronous flush of the worker thread when a reset is in
> progress resolves the lockdep splat by never even attempting to grab
> the lock in this particular scenario.
> 
> There are situatons where a synchronous cancel is required, however.
> So, always do the synchronous cancel if not in reset. And add an extra
> synchronous cancel to the end of the reset flow to account for when a
> reset is occurring at driver shutdown and the cancel is no longer
> synchronous but could lead to unallocated memory accesses if the
> worker is not stopped.
> 
> Signed-off-by: Zhanjun Dong <zhanjun.dong at intel.com>
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> Cc: Andi Shyti <andi.shyti at linux.intel.com>
> Cc: Daniel Vetter <daniel at ffwll.ch>

Reviewed-by: Andi Shyti <andi.shyti at linux.intel.com>

Thanks,
Andi


More information about the Intel-gfx mailing list