[syzbot] [dri?] WARNING in __ww_mutex_wound

John Stultz jstultz at google.com
Wed Jul 30 19:11:54 UTC 2025


On Wed, Jul 30, 2025 at 2:50 AM K Prateek Nayak <kprateek.nayak at amd.com> wrote:
> On 7/30/2025 1:57 PM, Maarten Lankhorst wrote:
> > Hey,
> >
> > This warning is introduced in linux-next as a4f0b6fef4b0 ("locking/mutex: Add p->blocked_on wrappers for correctness checks")
> > Adding relevant people from that commit.
> >
...
> >> ------------[ cut here ]------------
> >> WARNING: ./include/linux/sched.h:2173 at __clear_task_blocked_on include/linux/sched.h:2173 [inline], CPU#1: syz.1.8698/395
> >> WARNING: ./include/linux/sched.h:2173 at __ww_mutex_wound+0x21a/0x2b0 kernel/locking/ww_mutex.h:346, CPU#1: syz.1.8698/395
> >> Modules linked in:
> >> CPU: 1 UID: 0 PID: 395 Comm: syz.1.8698 Not tainted 6.16.0-rc6-next-20250718-syzkaller #0 PREEMPT(full)
> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
> >> RIP: 0010:__clear_task_blocked_on include/linux/sched.h:2173 [inline]
> >> RIP: 0010:__ww_mutex_wound+0x21a/0x2b0 kernel/locking/ww_mutex.h:346
>
> When wounding the lock owner, could it be possible that the lock
> owner is blocked on a different nested lock? Lock owner implies it
> is not blocked on the current lock we are trying to wound right?
>
> I remember John mentioning seeing circular chains in find_proxy_task()
> which required this but looking at this call-chain I'm wondering if
> only the __ww_mutex_check_waiters() (or some other path) requires
> __clear_task_blocked_on() for that case.

So yeah, I have tripped over this a few times (fixing and often later
re-introducing the problem) but usually later in my full proxy-exec
series, and somehow missed that the single-rq hit this.

Obviously with __ww_mutex_die() we are clearing the blocked on
relationship for the lock waiter, but in __ww_mutex_wound() we are
waking the lock *owner*, who might be waiting on a different lock, so
passing the held lock to the clear_task_blocked_on() checks trips
these warnings.

Passing NULL instead of lock is the right call here, I'll just need to
loosen the __clear_task_blocked_on() check for null as well.

I'll spin up a quick patch.

thanks
-john


More information about the dri-devel mailing list