[PATCH v15 00/43] DEPT(DEPendency Tracker)

Byungchul Park byungchul at sk.com
Wed May 14 03:07:59 UTC 2025


On Tue, May 13, 2025 at 07:06:47PM +0900, Byungchul Park wrote:
> I'm happy to see that dept reported a real problem in practice. See:
> 
>    https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t
>    https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/
> 
> I added a document describing dept, that would help you understand what
> dept is and how dept works. You can use dept just with CONFIG_DEPT on
> and by checking dmesg in runtime.
> 
> There are still false positives here and there and some of those are
> already in progress to suppress and the efforts are essencial until it
> gets more stable as lockdep experienced.
> 
> It's worth noting that EXPERIMENTAL in Kconfig is tagged.

I missed thanks for the support and contribution to:

   Harry Yoo <harry.yoo at oracle.com>
   Gwan-gyeong Mun <gwan-gyeong.mun at intel.com>
   Yunseong Kim <yskelg at gmail.com>
   Yeoreum Yun <yeoreum.yun at arm.com>

Thank you :)

	Byungchul

> ---
> 
> Hi Linus and folks,
> 
> I've been developing a tool for detecting deadlock possibilities by
> tracking wait/event rather than lock acquisition order to try to cover
> all synchonization machanisms.
> 
> Benefits:
> 
> 	0. Works with all lock primitives.
> 	1. Works with wait_for_completion()/complete().
> 	2. Works with PG_locked.
> 	3. Works with swait/wakeup.
> 	4. Works with waitqueue.
> 	5. Works with wait_bit.
> 	6. Multiple reports are allowed.
> 	7. Deduplication control on multiple reports.
> 	8. Withstand false positives thanks to 7.
> 	9. Easy to tag any wait/event.
> 
> Future works:
> 
> 	0. To make it more stable.
> 	1. To separates dept from lockdep.
> 	2. To improves performance in terms of time and space.
> 	3. To use dept as a dependency engine for lockdep.
> 	4. To add any missing tags of wait/event in the kernel.
> 	5. To deduplicate memory space for stack traces.
> 
> How to interpret reports:
> (See the document in this patchset for more detail.)
> 
> 	[S] the start of the event context
> 	[W] the wait disturbing the event from being triggered
> 	[E] the event that cannot be reachable
> 
> Thanks.
> 
> 	Byungchul
> 
> ---
> 
> Changes from v14:
> 	1. Rebase on the current latest, v6.15-rc6.
> 	2. Refactor dept code.
> 	3. With multi event sites for a single wait, even if an event
> 	   forms a circular dependency, the event can be recovered by
> 	   other event(or wake up) paths.  Even though informing the
> 	   circular dependency is worthy but it should be suppressed
> 	   once informing it, if it doesn't lead an actual deadlock.  So
> 	   introduce APIs to annotate the relationship between event
> 	   site and recover site, that are, event_site() and
> 	   dept_recover_event().
> 	4. wait_for_completion() worked with dept map embedded in struct
> 	   completion.  However, it generates a few false positves since
> 	   all the waits using the instance of struct completion, share
> 	   the map and key.  To avoid the false positves, make it not to
> 	   share the map and key but each wait_for_completion() caller
> 	   have its own key by default.  Of course, external maps also
> 	   can be used if needed.
> 	5. Fix a bug about hardirq on/off tracing.
> 	6. Implement basic unit test for dept.
> 	7. Add more supports for dma fence synchronization.
> 	8. Add emergency stop of dept e.g. on panic().
> 	9. Fix false positives by mmu_notifier_invalidate_*().
> 	10. Fix recursive call bug by DEPT_WARN_*() and DEPT_STOP().
> 	11. Fix trivial bugs in DEPT_WARN_*() and DEPT_STOP().
> 	12. Fix a bug that a spin lock, dept_pool_spin, is used in
> 	    both contexts of irq disabled and enabled without irq
> 	    disabled.
> 	13. Suppress reports with classes, any of that already have
> 	    been reported, even though they have different chains but
> 	    being barely meaningful.
> 	14. Print stacktrace of the wait that an event is now waking up,
> 	    not only stacktrace of the event.
> 	15. Make dept aware of lockdep_cmp_fn() that is used to avoid
> 	    false positives in lockdep so that dept can also avoid them.
> 	16. Do do_event() only if there are no ecxts have been
> 	    delimited.
> 	17. Fix a bug that was not synchronized for stage_m in struct
> 	    dept_task, using a spin lock, dept_task()->stage_lock.
> 	18. Fix a bug that dept didn't handle the case that multiple
> 	    ttwus for a single waiter can be called at the same time
> 	    e.i. a race issue.
> 	19. Distinguish each kernel context from others, not only by
> 	    system call but also by user oriented fault so that dept can
> 	    work with more accuracy information about kernel context.
> 	    That helps to avoid a few false positives.
> 	20. Limit dept's working to x86_64 and arm64.
> 
> Changes from v13:
> 
> 	1. Rebase on the current latest version, v6.9-rc7.
> 	2. Add 'dept' documentation describing dept APIs.
> 
> Changes from v12:
> 
> 	1. Refine the whole document for dept.
> 	2. Add 'Interpret dept report' section in the document, using a
> 	   deadlock report obtained in practice. Hope this version of
> 	   document helps guys understand dept better.
> 
> 	   https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t
> 	   https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/
> 
> Changes from v11:
> 
> 	1. Add 'dept' documentation describing the concept of dept.
> 	2. Rewrite the commit messages of the following commits for
> 	   using weaker lockdep annotation, for better description.
> 
> 	   fs/jbd2: Use a weaker annotation in journal handling
> 	   cpu/hotplug: Use a weaker annotation in AP thread
> 
> 	   (feedbacked by Thomas Gleixner)
> 
> Changes from v10:
> 
> 	1. Fix noinstr warning when building kernel source.
> 	2. dept has been reporting some false positives due to the folio
> 	   lock's unfairness. Reflect it and make dept work based on
> 	   dept annotaions instead of just wait and wake up primitives.
> 	3. Remove the support for PG_writeback while working on 2. I
> 	   will add the support later if needed.
> 	4. dept didn't print stacktrace for [S] if the participant of a
> 	   deadlock is not lock mechanism but general wait and event.
> 	   However, it made hard to interpret the report in that case.
> 	   So add support to print stacktrace of the requestor who asked
> 	   the event context to run - usually a waiter of the event does
> 	   it just before going to wait state.
> 	5. Give up tracking raw_local_irq_{disable,enable}() since it
> 	   totally messed up dept's irq tracking. So make it work in the
> 	   same way as lockdep does. I will consider it once any false
> 	   positives by those are observed again.
> 	6. Change the manual rwsem_acquire_read(->j_trans_commit_map)
> 	   annotation in fs/jbd2/transaction.c to the try version so
> 	   that it works as much as it exactly needs.
> 	7. Remove unnecessary 'inline' keyword in dept.c and add
> 	   '__maybe_unused' to a needed place.
> 
> Changes from v9:
> 
> 	1. Fix a bug. SDT tracking didn't work well because of my big
> 	   mistake that I should've used waiter's map to indentify its
> 	   class but it had been working with waker's one. FYI,
> 	   PG_locked and PG_writeback weren't affected. They still
> 	   worked well. (reported by YoungJun)
> 	
> Changes from v8:
> 
> 	1. Fix build error by adding EXPORT_SYMBOL(PG_locked_map) and
> 	   EXPORT_SYMBOL(PG_writeback_map) for kernel module build -
> 	   appologize for that. (reported by kernel test robot)
> 	2. Fix build error by removing header file's circular dependency
> 	   that was caused by "atomic.h", "kernel.h" and "irqflags.h",
> 	   which I introduced - appolgize for that. (reported by kernel
> 	   test robot)
> 
> Changes from v7:
> 
> 	1. Fix a bug that cannot track rwlock dependency properly,
> 	   introduced in v7. (reported by Boqun and lockdep selftest)
> 	2. Track wait/event of PG_{locked,writeback} more aggressively
> 	   assuming that when a bit of PG_{locked,writeback} is cleared
> 	   there might be waits on the bit. (reported by Linus, Hillf
> 	   and syzbot)
> 	3. Fix and clean bad style code e.i. unnecessarily introduced
> 	   a randome pattern and so on. (pointed out by Linux)
> 	4. Clean code for applying dept to wait_for_completion().
> 
> Changes from v6:
> 
> 	1. Tie to task scheduler code to track sleep and try_to_wake_up()
> 	   assuming sleeps cause waits, try_to_wake_up()s would be the
> 	   events that those are waiting for, of course with proper dept
> 	   annotations, sdt_might_sleep_weak(), sdt_might_sleep_strong()
> 	   and so on. For these cases, class is classified at sleep
> 	   entrance rather than the synchronization initialization code.
> 	   Which would extremely reduce false alarms.
> 	2. Remove the dept associated instance in each page struct for
> 	   tracking dependencies by PG_locked and PG_writeback thanks to
> 	   the 1. work above.
> 	3. Introduce CONFIG_dept_AGGRESIVE_TIMEOUT_WAIT to suppress
> 	   reports that waits with timeout set are involved, for those
> 	   who don't like verbose reporting.
> 	4. Add a mechanism to refill the internal memory pools on
> 	   running out so that dept could keep working as long as free
> 	   memory is available in the system.
> 	5. Re-enable tracking hashed-waitqueue wait. That's going to no
> 	   longer generate false positives because class is classified
> 	   at sleep entrance rather than the waitqueue initailization.
> 	6. Refactor to make it easier to port onto each new version of
> 	   the kernel.
> 	7. Apply dept to dma fence.
> 	8. Do trivial optimizaitions.
> 
> Changes from v5:
> 
> 	1. Use just pr_warn_once() rather than WARN_ONCE() on the lack
> 	   of internal resources because WARN_*() printing stacktrace is
> 	   too much for informing the lack. (feedback from Ted, Hyeonggon)
> 	2. Fix trivial bugs like missing initializing a struct before
> 	   using it.
> 	3. Assign a different class per task when handling onstack
> 	   variables for waitqueue or the like. Which makes dept
> 	   distinguish between onstack variables of different tasks so
> 	   as to prevent false positives. (reported by Hyeonggon)
> 	4. Make dept aware of even raw_local_irq_*() to prevent false
> 	   positives. (reported by Hyeonggon)
> 	5. Don't consider dependencies between the events that might be
> 	   triggered within __schedule() and the waits that requires
> 	    __schedule(), real ones. (reported by Hyeonggon)
> 	6. Unstage the staged wait that has prepare_to_wait_event()'ed
> 	   *and* yet to get to __schedule(), if we encounter __schedule()
> 	   in-between for another sleep, which is possible if e.g. a
> 	   mutex_lock() exists in 'condition' of ___wait_event().
> 	7. Turn on CONFIG_PROVE_LOCKING when CONFIG_DEPT is on, to rely
> 	   on the hardirq and softirq entrance tracing to make dept more
> 	   portable for now.
> 
> Changes from v4:
> 
> 	1. Fix some bugs that produce false alarms.
> 	2. Distinguish each syscall context from another *for arm64*.
> 	3. Make it not warn it but just print it in case dept ring
> 	   buffer gets exhausted. (feedback from Hyeonggon)
> 	4. Explicitely describe "EXPERIMENTAL" and "dept might produce
> 	   false positive reports" in Kconfig. (feedback from Ted)
> 
> Changes from v3:
> 
> 	1. dept shouldn't create dependencies between different depths
> 	   of a class that were indicated by *_lock_nested(). dept
> 	   normally doesn't but it does once another lock class comes
> 	   in. So fixed it. (feedback from Hyeonggon)
> 	2. dept considered a wait as a real wait once getting to
> 	   __schedule() even if it has been set to TASK_RUNNING by wake
> 	   up sources in advance. Fixed it so that dept doesn't consider
> 	   the case as a real wait. (feedback from Jan Kara)
> 	3. Stop tracking dependencies with a map once the event
> 	   associated with the map has been handled. dept will start to
> 	   work with the map again, on the next sleep.
> 
> Changes from v2:
> 
> 	1. Disable dept on bit_wait_table[] in sched/wait_bit.c
> 	   reporting a lot of false positives, which is my fault.
> 	   Wait/event for bit_wait_table[] should've been tagged in a
> 	   higher layer for better work, which is a future work.
> 	   (feedback from Jan Kara)
> 	2. Disable dept on crypto_larval's completion to prevent a false
> 	   positive.
> 
> Changes from v1:
> 
> 	1. Fix coding style and typo. (feedback from Steven)
> 	2. Distinguish each work context from another in workqueue.
> 	3. Skip checking lock acquisition with nest_lock, which is about
> 	   correct lock usage that should be checked by lockdep.
> 
> Changes from RFC(v0):
> 
> 	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
> 	   (feedback from Linus and Matthew)
> 	2. Use try version at lockdep_acquire_cpus_lock() annotation.
> 	3. Distinguish each syscall context from another.
> 
> Byungchul Park (43):
>   llist: move llist_{head,node} definition to types.h
>   dept: implement DEPT(DEPendency Tracker)
>   dept: add single event dependency tracker APIs
>   dept: add lock dependency tracker APIs
>   dept: tie to lockdep and IRQ tracing
>   dept: add proc knobs to show stats and dependency graph
>   dept: distinguish each kernel context from another
>   x86_64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to x86_64
>   arm64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64
>   dept: distinguish each work from another
>   dept: add a mechanism to refill the internal memory pools on running
>     out
>   dept: record the latest one out of consecutive waits of the same class
>   dept: apply sdt_might_sleep_{start,end}() to
>     wait_for_completion()/complete()
>   dept: apply sdt_might_sleep_{start,end}() to swait
>   dept: apply sdt_might_sleep_{start,end}() to waitqueue wait
>   dept: apply sdt_might_sleep_{start,end}() to hashed-waitqueue wait
>   dept: apply sdt_might_sleep_{start,end}() to dma fence
>   dept: track timeout waits separately with a new Kconfig
>   dept: apply timeout consideration to wait_for_completion()/complete()
>   dept: apply timeout consideration to swait
>   dept: apply timeout consideration to waitqueue wait
>   dept: apply timeout consideration to hashed-waitqueue wait
>   dept: apply timeout consideration to dma fence wait
>   dept: make dept able to work with an external wgen
>   dept: track PG_locked with dept
>   dept: print staged wait's stacktrace on report
>   locking/lockdep: prevent various lockdep assertions when
>     lockdep_off()'ed
>   dept: suppress reports with classes that have been already reported
>   dept: add documentation for dept
>   cpu/hotplug: use a weaker annotation in AP thread
>   fs/jbd2: use a weaker annotation in journal handling
>   dept: assign dept map to mmu notifier invalidation synchronization
>   dept: assign unique dept_key to each distinct dma fence caller
>   dept: make dept aware of lockdep_set_lock_cmp_fn() annotation
>   dept: make dept stop from working on debug_locks_off()
>   i2c: rename wait_for_completion callback to wait_for_completion_cb
>   dept: assign unique dept_key to each distinct wait_for_completion()
>     caller
>   completion, dept: introduce init_completion_dmap() API
>   dept: introduce a new type of dependency tracking between multi event
>     sites
>   dept: add module support for struct dept_event_site and
>     dept_event_site_dep
>   dept: introduce event_site() to disable event tracking if it's
>     recoverable
>   dept: implement a basic unit test for dept
>   dept: call dept_hardirqs_off() in local_irq_*() regardless of irq
>     state
> 
>  Documentation/dependency/dept.txt     |  735 ++++++
>  Documentation/dependency/dept_api.txt |  117 +
>  arch/arm64/Kconfig                    |    1 +
>  arch/arm64/kernel/syscall.c           |    7 +
>  arch/arm64/mm/fault.c                 |    7 +
>  arch/x86/Kconfig                      |    1 +
>  arch/x86/entry/syscall_64.c           |    7 +
>  arch/x86/mm/fault.c                   |    7 +
>  drivers/dma-buf/dma-fence.c           |   17 +-
>  drivers/i2c/algos/i2c-algo-pca.c      |    2 +-
>  drivers/i2c/busses/i2c-pca-isa.c      |    2 +-
>  drivers/i2c/busses/i2c-pca-platform.c |    2 +-
>  fs/jbd2/transaction.c                 |    2 +-
>  include/asm-generic/vmlinux.lds.h     |   13 +-
>  include/linux/completion.h            |  124 +-
>  include/linux/dept.h                  |  625 +++++
>  include/linux/dept_ldt.h              |   77 +
>  include/linux/dept_sdt.h              |   67 +
>  include/linux/dept_unit_test.h        |   67 +
>  include/linux/dma-fence.h             |   74 +-
>  include/linux/hardirq.h               |    3 +
>  include/linux/i2c-algo-pca.h          |    2 +-
>  include/linux/irqflags.h              |   21 +-
>  include/linux/llist.h                 |    8 -
>  include/linux/local_lock_internal.h   |    1 +
>  include/linux/lockdep.h               |  105 +-
>  include/linux/lockdep_types.h         |    3 +
>  include/linux/mm_types.h              |    2 +
>  include/linux/mmu_notifier.h          |   26 +
>  include/linux/module.h                |    5 +
>  include/linux/mutex.h                 |    1 +
>  include/linux/page-flags.h            |  125 +-
>  include/linux/pagemap.h               |    7 +-
>  include/linux/percpu-rwsem.h          |    2 +-
>  include/linux/rtmutex.h               |    1 +
>  include/linux/rwlock_types.h          |    1 +
>  include/linux/rwsem.h                 |    1 +
>  include/linux/sched.h                 |  120 +-
>  include/linux/seqlock.h               |    2 +-
>  include/linux/spinlock_types_raw.h    |    3 +
>  include/linux/srcu.h                  |    2 +-
>  include/linux/swait.h                 |    3 +
>  include/linux/types.h                 |    8 +
>  include/linux/wait.h                  |    3 +
>  include/linux/wait_bit.h              |    3 +
>  init/init_task.c                      |    2 +
>  init/main.c                           |    2 +
>  kernel/Makefile                       |    1 +
>  kernel/cpu.c                          |    2 +-
>  kernel/dependency/Makefile            |    5 +
>  kernel/dependency/dept.c              | 3510 +++++++++++++++++++++++++
>  kernel/dependency/dept_hash.h         |   10 +
>  kernel/dependency/dept_internal.h     |   64 +
>  kernel/dependency/dept_object.h       |   13 +
>  kernel/dependency/dept_proc.c         |   93 +
>  kernel/dependency/dept_unit_test.c    |  173 ++
>  kernel/exit.c                         |    1 +
>  kernel/fork.c                         |    2 +
>  kernel/locking/lockdep.c              |   33 +
>  kernel/module/main.c                  |   19 +
>  kernel/sched/completion.c             |   62 +-
>  kernel/sched/core.c                   |    8 +
>  kernel/workqueue.c                    |    3 +
>  lib/Kconfig.debug                     |   51 +
>  lib/debug_locks.c                     |    2 +
>  lib/locking-selftest.c                |    2 +
>  mm/filemap.c                          |   26 +
>  mm/mm_init.c                          |    2 +
>  mm/mmu_notifier.c                     |   31 +-
>  69 files changed, 6404 insertions(+), 125 deletions(-)
>  create mode 100644 Documentation/dependency/dept.txt
>  create mode 100644 Documentation/dependency/dept_api.txt
>  create mode 100644 include/linux/dept.h
>  create mode 100644 include/linux/dept_ldt.h
>  create mode 100644 include/linux/dept_sdt.h
>  create mode 100644 include/linux/dept_unit_test.h
>  create mode 100644 kernel/dependency/Makefile
>  create mode 100644 kernel/dependency/dept.c
>  create mode 100644 kernel/dependency/dept_hash.h
>  create mode 100644 kernel/dependency/dept_internal.h
>  create mode 100644 kernel/dependency/dept_object.h
>  create mode 100644 kernel/dependency/dept_proc.c
>  create mode 100644 kernel/dependency/dept_unit_test.c
> 
> 
> base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
> -- 
> 2.17.1


More information about the dri-devel mailing list