Another circular lock dependency involving fs-reclaim and drm_buddy

Tue Dec 6 00:04:31 UTC 2022

We fixed a similar issue with Philip's patch "drm/amdgpu: Drop eviction 
lock when allocating PT BO", but there was another one hiding underneath 
that (see the log below). The problem is, that we're still allocating 
page tables while holding the prange->lock in the kfd_svm code, which is 
also held in MMU notifiers. This creates a lock dependency between the 
vram_mgr->lock and fs-reclaim.

There are three ways around this:

  * Do the page table mapping outside prange->lock (creates a race,
    contradicts advice in Documentation/vm/hmm.rst)
  * Avoid all page table allocation in amdgpu_vm_update_range (i.e.
    allocate page tables ahead of time somehow)
  * Wrap vram_mgr->lock with memalloc_noreclaim_save/restore (may cause
    memory allocations to fail in drm_buddy under memory pressure)

I only mention the first one for completeness. It's not a valid 
solution. Neither of the remaining two options are particularly 
appealing either.

Regards,
   Felix

[   83.979486] ======================================================
[   83.986583] WARNING: possible circular locking dependency detected
[   83.993643] 5.19.0-kfd-fkuehlin #75 Not tainted
[   83.999044] ------------------------------------------------------
[   84.006088] kfdtest/3453 is trying to acquire lock:
[   84.011820] ffff9a998561e210 (&prange->lock){+.+.}-{3:3}, at: svm_range_cpu_invalidate_pagetables+0x90/0x5e0 [amdgpu]
[   84.023911]
                but task is already holding lock:
[   84.031608] ffffffffbcd929c0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: unmap_vmas+0x5/0x170
[   84.041992]
                which lock already depends on the new lock.

[   84.052785]
                the existing dependency chain (in reverse order) is:
[   84.061993]
                -> #3 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
[   84.071548]        fs_reclaim_acquire+0x6d/0xd0
[   84.076941]        kmem_cache_alloc_trace+0x34/0x760
[   84.082766]        alloc_workqueue_attrs+0x1b/0x50
[   84.088411]        workqueue_init+0x88/0x318
[   84.093533]        kernel_init_freeable+0x134/0x28f
[   84.099258]        kernel_init+0x16/0x130
[   84.104107]        ret_from_fork+0x1f/0x30
[   84.109038]
                -> #2 (fs_reclaim){+.+.}-{0:0}:
[   84.116348]        fs_reclaim_acquire+0xa1/0xd0
[   84.121697]        kmem_cache_alloc+0x2c/0x760
[   84.126948]        drm_block_alloc.isra.0+0x27/0x50 [drm_buddy]
[   84.133679]        split_block+0x4d/0x140 [drm_buddy]
[   84.139539]        drm_buddy_alloc_blocks+0x385/0x580 [drm_buddy]
[   84.146435]        amdgpu_vram_mgr_new+0x213/0x4f0 [amdgpu]
[   84.153399]        ttm_resource_alloc+0x31/0x80 [ttm]
[   84.159366]        ttm_bo_mem_space+0x8f/0x230 [ttm]
[   84.165169]        ttm_bo_validate+0xc5/0x170 [ttm]
[   84.170872]        ttm_bo_init_reserved+0x1a6/0x230 [ttm]
[   84.177075]        amdgpu_bo_create+0x1a0/0x510 [amdgpu]
[   84.183600]        amdgpu_bo_create_reserved+0x188/0x1e0 [amdgpu]
[   84.190803]        amdgpu_bo_create_kernel_at+0x64/0x200 [amdgpu]
[   84.197994]        amdgpu_ttm_init+0x420/0x4c0 [amdgpu]
[   84.204301]        gmc_v10_0_sw_init+0x33a/0x530 [amdgpu]
[   84.210813]        amdgpu_device_init.cold+0x10d4/0x17a1 [amdgpu]
[   84.218077]        amdgpu_driver_load_kms+0x15/0x110 [amdgpu]
[   84.224919]        amdgpu_pci_probe+0x142/0x350 [amdgpu]
[   84.231313]        local_pci_probe+0x40/0x80
[   84.236437]        work_for_cpu_fn+0x10/0x20
[   84.241500]        process_one_work+0x270/0x5a0
[   84.246805]        worker_thread+0x203/0x3c0
[   84.251828]        kthread+0xea/0x110
[   84.256229]        ret_from_fork+0x1f/0x30
[   84.261061]
                -> #1 (&mgr->lock){+.+.}-{3:3}:
[   84.268156]        __mutex_lock+0x9a/0xf30
[   84.272967]        amdgpu_vram_mgr_new+0x14a/0x4f0 [amdgpu]
[   84.279752]        ttm_resource_alloc+0x31/0x80 [ttm]
[   84.285602]        ttm_bo_mem_space+0x8f/0x230 [ttm]
[   84.291321]        ttm_bo_validate+0xc5/0x170 [ttm]
[   84.296939]        ttm_bo_init_reserved+0xe2/0x230 [ttm]
[   84.302969]        amdgpu_bo_create+0x1a0/0x510 [amdgpu]
[   84.309297]        amdgpu_bo_create_vm+0x2e/0x80 [amdgpu]
[   84.315656]        amdgpu_vm_pt_create+0xf5/0x270 [amdgpu]
[   84.322090]        amdgpu_vm_ptes_update+0x6c4/0x8f0 [amdgpu]
[   84.328793]        amdgpu_vm_update_range+0x29b/0x730 [amdgpu]
[   84.335537]        svm_range_validate_and_map+0xc78/0x1390 [amdgpu]
[   84.342734]        svm_range_set_attr+0xc74/0x1170 [amdgpu]
[   84.349222]        kfd_ioctl+0x189/0x5c0 [amdgpu]
[   84.354808]        __x64_sys_ioctl+0x80/0xb0
[   84.359738]        do_syscall_64+0x35/0x80
[   84.364481]        entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   84.370687]
                -> #0 (&prange->lock){+.+.}-{3:3}:
[   84.377864]        __lock_acquire+0x12ed/0x27e0
[   84.383027]        lock_acquire+0xca/0x2e0
[   84.387759]        __mutex_lock+0x9a/0xf30
[   84.392491]        svm_range_cpu_invalidate_pagetables+0x90/0x5e0 [amdgpu]
[   84.400345]        __mmu_notifier_invalidate_range_start+0x1d3/0x230
[   84.407410]        unmap_vmas+0x162/0x170
[   84.412126]        unmap_region+0xa8/0x110
[   84.416905]        __do_munmap+0x209/0x4f0
[   84.421680]        __vm_munmap+0x78/0x120
[   84.426365]        __x64_sys_munmap+0x17/0x20
[   84.431392]        do_syscall_64+0x35/0x80
[   84.436164]        entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   84.442405]
                other info that might help us debug this:

[   84.452431] Chain exists of:
                  &prange->lock --> fs_reclaim --> mmu_notifier_invalidate_range_start

[   84.466395]  Possible unsafe locking scenario:

[   84.473720]        CPU0                    CPU1
[   84.479020]        ----                    ----
[   84.484296]   lock(mmu_notifier_invalidate_range_start);
[   84.490333]                                lock(fs_reclaim);
[   84.496705]                                lock(mmu_notifier_invalidate_range_start);
[   84.505246]   lock(&prange->lock);
[   84.509361]
                 *** DEADLOCK ***

[   84.517360] 2 locks held by kfdtest/3453:
[   84.522060]  #0: ffff9a99821ec4a8 (&mm->mmap_lock#2){++++}-{3:3}, at: __do_munmap+0x417/0x4f0
[   84.531287]  #1: ffffffffbcd929c0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: unmap_vmas+0x5/0x170
[   84.541896]
                stack backtrace:
[   84.547630] CPU: 3 PID: 3453 Comm: kfdtest Not tainted 5.19.0-kfd-fkuehlin #75
[   84.555537] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EPYCD8-2T, BIOS P2.60 04/10/2020
[   84.565788] Call Trace:
[   84.568925]  <TASK>
[   84.571702]  dump_stack_lvl+0x45/0x59
[   84.576034]  check_noncircular+0xfe/0x110
[   84.580715]  ? kernel_text_address+0x6d/0xe0
[   84.585652]  __lock_acquire+0x12ed/0x27e0
[   84.590340]  lock_acquire+0xca/0x2e0
[   84.594595]  ? svm_range_cpu_invalidate_pagetables+0x90/0x5e0 [amdgpu]
[   84.602338]  __mutex_lock+0x9a/0xf30
[   84.606714]  ? svm_range_cpu_invalidate_pagetables+0x90/0x5e0 [amdgpu]
[   84.614262]  ? svm_range_cpu_invalidate_pagetables+0x90/0x5e0 [amdgpu]
[   84.621806]  ? svm_range_cpu_invalidate_pagetables+0x90/0x5e0 [amdgpu]
[   84.629353]  svm_range_cpu_invalidate_pagetables+0x90/0x5e0 [amdgpu]
[   84.636742]  ? lock_release+0x139/0x2b0
[   84.641374]  __mmu_notifier_invalidate_range_start+0x1d3/0x230
[   84.647976]  unmap_vmas+0x162/0x170
[   84.652203]  unmap_region+0xa8/0x110
[   84.656503]  __do_munmap+0x209/0x4f0
[   84.660792]  __vm_munmap+0x78/0x120
[   84.664977]  __x64_sys_munmap+0x17/0x20
[   84.669499]  do_syscall_64+0x35/0x80
[   84.673755]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   84.679485] RIP: 0033:0x7f32872eb97b
[   84.683738] Code: 8b 15 19 35 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb 89 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 34 0d 00 f7 d8 64 89 01 48
[   84.703915] RSP: 002b:00007fffb06c4508 EFLAGS: 00000246 ORIG_RAX: 000000000000000b
[   84.712205] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f32872eb97b
[   84.720072] RDX: 0000000004000000 RSI: 0000000004000000 RDI: 00007f32831ae000
[   84.727944] RBP: 00007fffb06c4750 R08: 00007fffb06c4548 R09: 000055e7570ad230
[   84.735809] R10: 000055e757088010 R11: 0000000000000246 R12: 000055e75453cefa
[   84.743688] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000000
[   84.751584]  </TASK>

-- 
F e l i x   K u e h l i n g
PMTS Software Development Engineer | Linux Compute Kernel
1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
(O) +1(289)695-1597
     _     _   _   _____   _____
    / \   | \ / | |  _  \  \ _  |
   / A \  | \M/ | | |D) )  /|_| |
  /_/ \_\ |_| |_| |_____/ |__/ \|   facebook.com/AMD | amd.com