Report 2 in ext4 and journal based on v5.17-rc1

Jan Kara jack at suse.cz
Mon Feb 28 10:14:44 UTC 2022


On Mon 28-02-22 18:28:26, Byungchul Park wrote:
> On Thu, Feb 24, 2022 at 11:22:39AM +0100, Jan Kara wrote:
> > On Thu 24-02-22 10:11:02, Byungchul Park wrote:
> > > On Wed, Feb 23, 2022 at 03:48:59PM +0100, Jan Kara wrote:
> > > > > KJOURNALD2(kthread)	TASK1(ksys_write)	TASK2(ksys_write)
> > > > > 
> > > > > wait A
> > > > > --- stuck
> > > > > 			wait B
> > > > > 			--- stuck
> > > > > 						wait C
> > > > > 						--- stuck
> > > > > 
> > > > > wake up B		wake up C		wake up A
> > > > > 
> > > > > where:
> > > > > A is a wait_queue, j_wait_commit
> > > > > B is a wait_queue, j_wait_transaction_locked
> > > > > C is a rwsem, mapping.invalidate_lock
> > > > 
> > > > I see. But a situation like this is not necessarily a guarantee of a
> > > > deadlock, is it? I mean there can be task D that will eventually call say
> > > > 'wake up B' and unblock everything and this is how things were designed to
> > > > work? Multiple sources of wakeups are quite common I'd say... What does
> > > 
> > > Yes. At the very beginning when I desgined Dept, I was thinking whether
> > > to support multiple wakeup sources or not for a quite long time.
> > > Supporting it would be a better option to aovid non-critical reports.
> > > However, I thought anyway we'd better fix it - not urgent tho - if
> > > there's any single circle dependency. That's why I decided not to
> > > support it for now and wanted to gather the kernel guys' opinions. Thing
> > > is which policy we should go with.
> > 
> > I see. So supporting only a single wakeup source is fine for locks I guess.
> > But for general wait queues or other synchronization mechanisms, I'm afraid
> > it will lead to quite some false positive reports. Just my 2c.
> 
> Thank you for your feedback.
> 
> I realized we've been using "false positive" differently. There exist
> the three types of code in terms of dependency and deadlock. It's worth
> noting that dependencies are built from between waits and events in Dept.
> 
> ---
> 
> case 1. Code with an actual circular dependency, but not deadlock.
> 
>    A circular dependency can be broken by a rescue wakeup source e.g.
>    timeout. It's not a deadlock. If it's okay that the contexts
>    participating in the circular dependency and others waiting for the
>    events in the circle are stuck until it gets broken. Otherwise, say,
>    if it's not meant, then it's anyway problematic.
> 
>    1-1. What if we judge this code is problematic?
>    1-2. What if we judge this code is good?
> 
> case 2. Code with an actual circular dependency, and deadlock.
> 
>    There's no other wakeup source than those within the circular
>    dependency. Literally deadlock. It's problematic and critical.
> 
>    2-1. What if we judge this code is problematic?
>    2-2. What if we judge this code is good?
> 
> case 3. Code with no actual circular dependency, and not deadlock.
> 
>    Must be good.
> 
>    3-1. What if we judge this code is problematic?
>    3-2. What if we judge this code is good?
> 
> ---
> 
> I call only 3-1 "false positive" circular dependency. And you call 1-1
> and 3-1 "false positive" deadlock.
> 
> I've been wondering if the kernel guys esp. Linus considers code with
> any circular dependency is problematic or not, even if it won't lead to
> a deadlock, say, case 1. Even though I designed Dept based on what I
> believe is right, of course, I'm willing to change the design according
> to the majority opinion.
> 
> However, I would never allow case 1 if I were the owner of the kernel
> for better stability, even though the code works anyway okay for now.

So yes, I call a report for the situation "There is circular dependency but
deadlock is not possible." a false positive. And that is because in my
opinion your definition of circular dependency includes schemes that are
useful and used in the kernel.

Your example in case 1 is kind of borderline (I personally would consider
that bug as well) but there are other more valid schemes with multiple
wakeup sources like:

We have a queue of work to do Q protected by lock L. Consumer process has
code like:

while (1) {
	lock L
	prepare_to_wait(work_queued);
	if (no work) {
		unlock L
		sleep
	} else {
		unlock L
		do work
		wake_up(work_done)
	}
}

AFAIU Dept will create dependency here that 'wakeup work_done' is after
'wait for work_queued'. Producer has code like:

while (1) {
	lock L
	prepare_to_wait(work_done)
	if (too much work queued) {
		unlock L
		sleep
	} else {
		queue work
		unlock L
		wake_up(work_queued)
	}
}

And Dept will create dependency here that 'wakeup work_queued' is after
'wait for work_done'. And thus we have a trivial cycle in the dependencies
despite the code being perfectly valid and safe.

								Honza
-- 
Jan Kara <jack at suse.com>
SUSE Labs, CR


More information about the dri-devel mailing list