Report 2 in ext4 and journal based on v5.17-rc1
Jan Kara
jack at suse.cz
Mon Feb 28 10:14:44 UTC 2022
On Mon 28-02-22 18:28:26, Byungchul Park wrote:
> On Thu, Feb 24, 2022 at 11:22:39AM +0100, Jan Kara wrote:
> > On Thu 24-02-22 10:11:02, Byungchul Park wrote:
> > > On Wed, Feb 23, 2022 at 03:48:59PM +0100, Jan Kara wrote:
> > > > > KJOURNALD2(kthread) TASK1(ksys_write) TASK2(ksys_write)
> > > > >
> > > > > wait A
> > > > > --- stuck
> > > > > wait B
> > > > > --- stuck
> > > > > wait C
> > > > > --- stuck
> > > > >
> > > > > wake up B wake up C wake up A
> > > > >
> > > > > where:
> > > > > A is a wait_queue, j_wait_commit
> > > > > B is a wait_queue, j_wait_transaction_locked
> > > > > C is a rwsem, mapping.invalidate_lock
> > > >
> > > > I see. But a situation like this is not necessarily a guarantee of a
> > > > deadlock, is it? I mean there can be task D that will eventually call say
> > > > 'wake up B' and unblock everything and this is how things were designed to
> > > > work? Multiple sources of wakeups are quite common I'd say... What does
> > >
> > > Yes. At the very beginning when I desgined Dept, I was thinking whether
> > > to support multiple wakeup sources or not for a quite long time.
> > > Supporting it would be a better option to aovid non-critical reports.
> > > However, I thought anyway we'd better fix it - not urgent tho - if
> > > there's any single circle dependency. That's why I decided not to
> > > support it for now and wanted to gather the kernel guys' opinions. Thing
> > > is which policy we should go with.
> >
> > I see. So supporting only a single wakeup source is fine for locks I guess.
> > But for general wait queues or other synchronization mechanisms, I'm afraid
> > it will lead to quite some false positive reports. Just my 2c.
>
> Thank you for your feedback.
>
> I realized we've been using "false positive" differently. There exist
> the three types of code in terms of dependency and deadlock. It's worth
> noting that dependencies are built from between waits and events in Dept.
>
> ---
>
> case 1. Code with an actual circular dependency, but not deadlock.
>
> A circular dependency can be broken by a rescue wakeup source e.g.
> timeout. It's not a deadlock. If it's okay that the contexts
> participating in the circular dependency and others waiting for the
> events in the circle are stuck until it gets broken. Otherwise, say,
> if it's not meant, then it's anyway problematic.
>
> 1-1. What if we judge this code is problematic?
> 1-2. What if we judge this code is good?
>
> case 2. Code with an actual circular dependency, and deadlock.
>
> There's no other wakeup source than those within the circular
> dependency. Literally deadlock. It's problematic and critical.
>
> 2-1. What if we judge this code is problematic?
> 2-2. What if we judge this code is good?
>
> case 3. Code with no actual circular dependency, and not deadlock.
>
> Must be good.
>
> 3-1. What if we judge this code is problematic?
> 3-2. What if we judge this code is good?
>
> ---
>
> I call only 3-1 "false positive" circular dependency. And you call 1-1
> and 3-1 "false positive" deadlock.
>
> I've been wondering if the kernel guys esp. Linus considers code with
> any circular dependency is problematic or not, even if it won't lead to
> a deadlock, say, case 1. Even though I designed Dept based on what I
> believe is right, of course, I'm willing to change the design according
> to the majority opinion.
>
> However, I would never allow case 1 if I were the owner of the kernel
> for better stability, even though the code works anyway okay for now.
So yes, I call a report for the situation "There is circular dependency but
deadlock is not possible." a false positive. And that is because in my
opinion your definition of circular dependency includes schemes that are
useful and used in the kernel.
Your example in case 1 is kind of borderline (I personally would consider
that bug as well) but there are other more valid schemes with multiple
wakeup sources like:
We have a queue of work to do Q protected by lock L. Consumer process has
code like:
while (1) {
lock L
prepare_to_wait(work_queued);
if (no work) {
unlock L
sleep
} else {
unlock L
do work
wake_up(work_done)
}
}
AFAIU Dept will create dependency here that 'wakeup work_done' is after
'wait for work_queued'. Producer has code like:
while (1) {
lock L
prepare_to_wait(work_done)
if (too much work queued) {
unlock L
sleep
} else {
queue work
unlock L
wake_up(work_queued)
}
}
And Dept will create dependency here that 'wakeup work_queued' is after
'wait for work_done'. And thus we have a trivial cycle in the dependencies
despite the code being perfectly valid and safe.
Honza
--
Jan Kara <jack at suse.com>
SUSE Labs, CR
More information about the dri-devel
mailing list