[PATCH v15 00/43] DEPT(DEPendency Tracker)
Byungchul Park
byungchul at sk.com
Wed May 14 03:07:59 UTC 2025
On Tue, May 13, 2025 at 07:06:47PM +0900, Byungchul Park wrote:
> I'm happy to see that dept reported a real problem in practice. See:
>
> https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t
> https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/
>
> I added a document describing dept, that would help you understand what
> dept is and how dept works. You can use dept just with CONFIG_DEPT on
> and by checking dmesg in runtime.
>
> There are still false positives here and there and some of those are
> already in progress to suppress and the efforts are essencial until it
> gets more stable as lockdep experienced.
>
> It's worth noting that EXPERIMENTAL in Kconfig is tagged.
I missed thanks for the support and contribution to:
Harry Yoo <harry.yoo at oracle.com>
Gwan-gyeong Mun <gwan-gyeong.mun at intel.com>
Yunseong Kim <yskelg at gmail.com>
Yeoreum Yun <yeoreum.yun at arm.com>
Thank you :)
Byungchul
> ---
>
> Hi Linus and folks,
>
> I've been developing a tool for detecting deadlock possibilities by
> tracking wait/event rather than lock acquisition order to try to cover
> all synchonization machanisms.
>
> Benefits:
>
> 0. Works with all lock primitives.
> 1. Works with wait_for_completion()/complete().
> 2. Works with PG_locked.
> 3. Works with swait/wakeup.
> 4. Works with waitqueue.
> 5. Works with wait_bit.
> 6. Multiple reports are allowed.
> 7. Deduplication control on multiple reports.
> 8. Withstand false positives thanks to 7.
> 9. Easy to tag any wait/event.
>
> Future works:
>
> 0. To make it more stable.
> 1. To separates dept from lockdep.
> 2. To improves performance in terms of time and space.
> 3. To use dept as a dependency engine for lockdep.
> 4. To add any missing tags of wait/event in the kernel.
> 5. To deduplicate memory space for stack traces.
>
> How to interpret reports:
> (See the document in this patchset for more detail.)
>
> [S] the start of the event context
> [W] the wait disturbing the event from being triggered
> [E] the event that cannot be reachable
>
> Thanks.
>
> Byungchul
>
> ---
>
> Changes from v14:
> 1. Rebase on the current latest, v6.15-rc6.
> 2. Refactor dept code.
> 3. With multi event sites for a single wait, even if an event
> forms a circular dependency, the event can be recovered by
> other event(or wake up) paths. Even though informing the
> circular dependency is worthy but it should be suppressed
> once informing it, if it doesn't lead an actual deadlock. So
> introduce APIs to annotate the relationship between event
> site and recover site, that are, event_site() and
> dept_recover_event().
> 4. wait_for_completion() worked with dept map embedded in struct
> completion. However, it generates a few false positves since
> all the waits using the instance of struct completion, share
> the map and key. To avoid the false positves, make it not to
> share the map and key but each wait_for_completion() caller
> have its own key by default. Of course, external maps also
> can be used if needed.
> 5. Fix a bug about hardirq on/off tracing.
> 6. Implement basic unit test for dept.
> 7. Add more supports for dma fence synchronization.
> 8. Add emergency stop of dept e.g. on panic().
> 9. Fix false positives by mmu_notifier_invalidate_*().
> 10. Fix recursive call bug by DEPT_WARN_*() and DEPT_STOP().
> 11. Fix trivial bugs in DEPT_WARN_*() and DEPT_STOP().
> 12. Fix a bug that a spin lock, dept_pool_spin, is used in
> both contexts of irq disabled and enabled without irq
> disabled.
> 13. Suppress reports with classes, any of that already have
> been reported, even though they have different chains but
> being barely meaningful.
> 14. Print stacktrace of the wait that an event is now waking up,
> not only stacktrace of the event.
> 15. Make dept aware of lockdep_cmp_fn() that is used to avoid
> false positives in lockdep so that dept can also avoid them.
> 16. Do do_event() only if there are no ecxts have been
> delimited.
> 17. Fix a bug that was not synchronized for stage_m in struct
> dept_task, using a spin lock, dept_task()->stage_lock.
> 18. Fix a bug that dept didn't handle the case that multiple
> ttwus for a single waiter can be called at the same time
> e.i. a race issue.
> 19. Distinguish each kernel context from others, not only by
> system call but also by user oriented fault so that dept can
> work with more accuracy information about kernel context.
> That helps to avoid a few false positives.
> 20. Limit dept's working to x86_64 and arm64.
>
> Changes from v13:
>
> 1. Rebase on the current latest version, v6.9-rc7.
> 2. Add 'dept' documentation describing dept APIs.
>
> Changes from v12:
>
> 1. Refine the whole document for dept.
> 2. Add 'Interpret dept report' section in the document, using a
> deadlock report obtained in practice. Hope this version of
> document helps guys understand dept better.
>
> https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t
> https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/
>
> Changes from v11:
>
> 1. Add 'dept' documentation describing the concept of dept.
> 2. Rewrite the commit messages of the following commits for
> using weaker lockdep annotation, for better description.
>
> fs/jbd2: Use a weaker annotation in journal handling
> cpu/hotplug: Use a weaker annotation in AP thread
>
> (feedbacked by Thomas Gleixner)
>
> Changes from v10:
>
> 1. Fix noinstr warning when building kernel source.
> 2. dept has been reporting some false positives due to the folio
> lock's unfairness. Reflect it and make dept work based on
> dept annotaions instead of just wait and wake up primitives.
> 3. Remove the support for PG_writeback while working on 2. I
> will add the support later if needed.
> 4. dept didn't print stacktrace for [S] if the participant of a
> deadlock is not lock mechanism but general wait and event.
> However, it made hard to interpret the report in that case.
> So add support to print stacktrace of the requestor who asked
> the event context to run - usually a waiter of the event does
> it just before going to wait state.
> 5. Give up tracking raw_local_irq_{disable,enable}() since it
> totally messed up dept's irq tracking. So make it work in the
> same way as lockdep does. I will consider it once any false
> positives by those are observed again.
> 6. Change the manual rwsem_acquire_read(->j_trans_commit_map)
> annotation in fs/jbd2/transaction.c to the try version so
> that it works as much as it exactly needs.
> 7. Remove unnecessary 'inline' keyword in dept.c and add
> '__maybe_unused' to a needed place.
>
> Changes from v9:
>
> 1. Fix a bug. SDT tracking didn't work well because of my big
> mistake that I should've used waiter's map to indentify its
> class but it had been working with waker's one. FYI,
> PG_locked and PG_writeback weren't affected. They still
> worked well. (reported by YoungJun)
>
> Changes from v8:
>
> 1. Fix build error by adding EXPORT_SYMBOL(PG_locked_map) and
> EXPORT_SYMBOL(PG_writeback_map) for kernel module build -
> appologize for that. (reported by kernel test robot)
> 2. Fix build error by removing header file's circular dependency
> that was caused by "atomic.h", "kernel.h" and "irqflags.h",
> which I introduced - appolgize for that. (reported by kernel
> test robot)
>
> Changes from v7:
>
> 1. Fix a bug that cannot track rwlock dependency properly,
> introduced in v7. (reported by Boqun and lockdep selftest)
> 2. Track wait/event of PG_{locked,writeback} more aggressively
> assuming that when a bit of PG_{locked,writeback} is cleared
> there might be waits on the bit. (reported by Linus, Hillf
> and syzbot)
> 3. Fix and clean bad style code e.i. unnecessarily introduced
> a randome pattern and so on. (pointed out by Linux)
> 4. Clean code for applying dept to wait_for_completion().
>
> Changes from v6:
>
> 1. Tie to task scheduler code to track sleep and try_to_wake_up()
> assuming sleeps cause waits, try_to_wake_up()s would be the
> events that those are waiting for, of course with proper dept
> annotations, sdt_might_sleep_weak(), sdt_might_sleep_strong()
> and so on. For these cases, class is classified at sleep
> entrance rather than the synchronization initialization code.
> Which would extremely reduce false alarms.
> 2. Remove the dept associated instance in each page struct for
> tracking dependencies by PG_locked and PG_writeback thanks to
> the 1. work above.
> 3. Introduce CONFIG_dept_AGGRESIVE_TIMEOUT_WAIT to suppress
> reports that waits with timeout set are involved, for those
> who don't like verbose reporting.
> 4. Add a mechanism to refill the internal memory pools on
> running out so that dept could keep working as long as free
> memory is available in the system.
> 5. Re-enable tracking hashed-waitqueue wait. That's going to no
> longer generate false positives because class is classified
> at sleep entrance rather than the waitqueue initailization.
> 6. Refactor to make it easier to port onto each new version of
> the kernel.
> 7. Apply dept to dma fence.
> 8. Do trivial optimizaitions.
>
> Changes from v5:
>
> 1. Use just pr_warn_once() rather than WARN_ONCE() on the lack
> of internal resources because WARN_*() printing stacktrace is
> too much for informing the lack. (feedback from Ted, Hyeonggon)
> 2. Fix trivial bugs like missing initializing a struct before
> using it.
> 3. Assign a different class per task when handling onstack
> variables for waitqueue or the like. Which makes dept
> distinguish between onstack variables of different tasks so
> as to prevent false positives. (reported by Hyeonggon)
> 4. Make dept aware of even raw_local_irq_*() to prevent false
> positives. (reported by Hyeonggon)
> 5. Don't consider dependencies between the events that might be
> triggered within __schedule() and the waits that requires
> __schedule(), real ones. (reported by Hyeonggon)
> 6. Unstage the staged wait that has prepare_to_wait_event()'ed
> *and* yet to get to __schedule(), if we encounter __schedule()
> in-between for another sleep, which is possible if e.g. a
> mutex_lock() exists in 'condition' of ___wait_event().
> 7. Turn on CONFIG_PROVE_LOCKING when CONFIG_DEPT is on, to rely
> on the hardirq and softirq entrance tracing to make dept more
> portable for now.
>
> Changes from v4:
>
> 1. Fix some bugs that produce false alarms.
> 2. Distinguish each syscall context from another *for arm64*.
> 3. Make it not warn it but just print it in case dept ring
> buffer gets exhausted. (feedback from Hyeonggon)
> 4. Explicitely describe "EXPERIMENTAL" and "dept might produce
> false positive reports" in Kconfig. (feedback from Ted)
>
> Changes from v3:
>
> 1. dept shouldn't create dependencies between different depths
> of a class that were indicated by *_lock_nested(). dept
> normally doesn't but it does once another lock class comes
> in. So fixed it. (feedback from Hyeonggon)
> 2. dept considered a wait as a real wait once getting to
> __schedule() even if it has been set to TASK_RUNNING by wake
> up sources in advance. Fixed it so that dept doesn't consider
> the case as a real wait. (feedback from Jan Kara)
> 3. Stop tracking dependencies with a map once the event
> associated with the map has been handled. dept will start to
> work with the map again, on the next sleep.
>
> Changes from v2:
>
> 1. Disable dept on bit_wait_table[] in sched/wait_bit.c
> reporting a lot of false positives, which is my fault.
> Wait/event for bit_wait_table[] should've been tagged in a
> higher layer for better work, which is a future work.
> (feedback from Jan Kara)
> 2. Disable dept on crypto_larval's completion to prevent a false
> positive.
>
> Changes from v1:
>
> 1. Fix coding style and typo. (feedback from Steven)
> 2. Distinguish each work context from another in workqueue.
> 3. Skip checking lock acquisition with nest_lock, which is about
> correct lock usage that should be checked by lockdep.
>
> Changes from RFC(v0):
>
> 1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
> (feedback from Linus and Matthew)
> 2. Use try version at lockdep_acquire_cpus_lock() annotation.
> 3. Distinguish each syscall context from another.
>
> Byungchul Park (43):
> llist: move llist_{head,node} definition to types.h
> dept: implement DEPT(DEPendency Tracker)
> dept: add single event dependency tracker APIs
> dept: add lock dependency tracker APIs
> dept: tie to lockdep and IRQ tracing
> dept: add proc knobs to show stats and dependency graph
> dept: distinguish each kernel context from another
> x86_64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to x86_64
> arm64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64
> dept: distinguish each work from another
> dept: add a mechanism to refill the internal memory pools on running
> out
> dept: record the latest one out of consecutive waits of the same class
> dept: apply sdt_might_sleep_{start,end}() to
> wait_for_completion()/complete()
> dept: apply sdt_might_sleep_{start,end}() to swait
> dept: apply sdt_might_sleep_{start,end}() to waitqueue wait
> dept: apply sdt_might_sleep_{start,end}() to hashed-waitqueue wait
> dept: apply sdt_might_sleep_{start,end}() to dma fence
> dept: track timeout waits separately with a new Kconfig
> dept: apply timeout consideration to wait_for_completion()/complete()
> dept: apply timeout consideration to swait
> dept: apply timeout consideration to waitqueue wait
> dept: apply timeout consideration to hashed-waitqueue wait
> dept: apply timeout consideration to dma fence wait
> dept: make dept able to work with an external wgen
> dept: track PG_locked with dept
> dept: print staged wait's stacktrace on report
> locking/lockdep: prevent various lockdep assertions when
> lockdep_off()'ed
> dept: suppress reports with classes that have been already reported
> dept: add documentation for dept
> cpu/hotplug: use a weaker annotation in AP thread
> fs/jbd2: use a weaker annotation in journal handling
> dept: assign dept map to mmu notifier invalidation synchronization
> dept: assign unique dept_key to each distinct dma fence caller
> dept: make dept aware of lockdep_set_lock_cmp_fn() annotation
> dept: make dept stop from working on debug_locks_off()
> i2c: rename wait_for_completion callback to wait_for_completion_cb
> dept: assign unique dept_key to each distinct wait_for_completion()
> caller
> completion, dept: introduce init_completion_dmap() API
> dept: introduce a new type of dependency tracking between multi event
> sites
> dept: add module support for struct dept_event_site and
> dept_event_site_dep
> dept: introduce event_site() to disable event tracking if it's
> recoverable
> dept: implement a basic unit test for dept
> dept: call dept_hardirqs_off() in local_irq_*() regardless of irq
> state
>
> Documentation/dependency/dept.txt | 735 ++++++
> Documentation/dependency/dept_api.txt | 117 +
> arch/arm64/Kconfig | 1 +
> arch/arm64/kernel/syscall.c | 7 +
> arch/arm64/mm/fault.c | 7 +
> arch/x86/Kconfig | 1 +
> arch/x86/entry/syscall_64.c | 7 +
> arch/x86/mm/fault.c | 7 +
> drivers/dma-buf/dma-fence.c | 17 +-
> drivers/i2c/algos/i2c-algo-pca.c | 2 +-
> drivers/i2c/busses/i2c-pca-isa.c | 2 +-
> drivers/i2c/busses/i2c-pca-platform.c | 2 +-
> fs/jbd2/transaction.c | 2 +-
> include/asm-generic/vmlinux.lds.h | 13 +-
> include/linux/completion.h | 124 +-
> include/linux/dept.h | 625 +++++
> include/linux/dept_ldt.h | 77 +
> include/linux/dept_sdt.h | 67 +
> include/linux/dept_unit_test.h | 67 +
> include/linux/dma-fence.h | 74 +-
> include/linux/hardirq.h | 3 +
> include/linux/i2c-algo-pca.h | 2 +-
> include/linux/irqflags.h | 21 +-
> include/linux/llist.h | 8 -
> include/linux/local_lock_internal.h | 1 +
> include/linux/lockdep.h | 105 +-
> include/linux/lockdep_types.h | 3 +
> include/linux/mm_types.h | 2 +
> include/linux/mmu_notifier.h | 26 +
> include/linux/module.h | 5 +
> include/linux/mutex.h | 1 +
> include/linux/page-flags.h | 125 +-
> include/linux/pagemap.h | 7 +-
> include/linux/percpu-rwsem.h | 2 +-
> include/linux/rtmutex.h | 1 +
> include/linux/rwlock_types.h | 1 +
> include/linux/rwsem.h | 1 +
> include/linux/sched.h | 120 +-
> include/linux/seqlock.h | 2 +-
> include/linux/spinlock_types_raw.h | 3 +
> include/linux/srcu.h | 2 +-
> include/linux/swait.h | 3 +
> include/linux/types.h | 8 +
> include/linux/wait.h | 3 +
> include/linux/wait_bit.h | 3 +
> init/init_task.c | 2 +
> init/main.c | 2 +
> kernel/Makefile | 1 +
> kernel/cpu.c | 2 +-
> kernel/dependency/Makefile | 5 +
> kernel/dependency/dept.c | 3510 +++++++++++++++++++++++++
> kernel/dependency/dept_hash.h | 10 +
> kernel/dependency/dept_internal.h | 64 +
> kernel/dependency/dept_object.h | 13 +
> kernel/dependency/dept_proc.c | 93 +
> kernel/dependency/dept_unit_test.c | 173 ++
> kernel/exit.c | 1 +
> kernel/fork.c | 2 +
> kernel/locking/lockdep.c | 33 +
> kernel/module/main.c | 19 +
> kernel/sched/completion.c | 62 +-
> kernel/sched/core.c | 8 +
> kernel/workqueue.c | 3 +
> lib/Kconfig.debug | 51 +
> lib/debug_locks.c | 2 +
> lib/locking-selftest.c | 2 +
> mm/filemap.c | 26 +
> mm/mm_init.c | 2 +
> mm/mmu_notifier.c | 31 +-
> 69 files changed, 6404 insertions(+), 125 deletions(-)
> create mode 100644 Documentation/dependency/dept.txt
> create mode 100644 Documentation/dependency/dept_api.txt
> create mode 100644 include/linux/dept.h
> create mode 100644 include/linux/dept_ldt.h
> create mode 100644 include/linux/dept_sdt.h
> create mode 100644 include/linux/dept_unit_test.h
> create mode 100644 kernel/dependency/Makefile
> create mode 100644 kernel/dependency/dept.c
> create mode 100644 kernel/dependency/dept_hash.h
> create mode 100644 kernel/dependency/dept_internal.h
> create mode 100644 kernel/dependency/dept_object.h
> create mode 100644 kernel/dependency/dept_proc.c
> create mode 100644 kernel/dependency/dept_unit_test.c
>
>
> base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
> --
> 2.17.1
More information about the dri-devel
mailing list