<html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> <div class="moz-cite-prefix">On 2025-01-07 07:30, Deng, Emily wrote: </div> <blockquote type="cite" cite="mid:PH0PR12MB5417E664CA264D62E1F2EB9B8F112@PH0PR12MB5417.namprd12.prod.outlook.com"> <pre class="moz-quote-pre" wrap="">[AMD Official Use Only - AMD Internal Distribution Only] Hi Felix, You are right, it is easily to hit deadlock, don't know why LOCKDEP doesn't catch this. Need to find another solution. Hi Philip, Do you have a solution for this delay free pt? </pre> </blockquote> Thanks for debugging this case, I had a patch to not free PTB bo when unmapping from GPU, but it will waste VRAM memory. My test case also passed with the tlb flush fence fix, I don't see the no-retry fault generated any more. The deadlock is probably from svm_range_unmap_from_gpu -> flush_workqueue(adev->wq), this is from mmu notifier callback, actually we only need flush pt_free_work before mapping to gpu, please remove the flush_workqueue in unmap from gpu. If deadlock still happens, please post the backtrace. I think you don't need add new adev->wq, use default system_wq and flush_work. Regards, Philip <blockquote type="cite" cite="mid:PH0PR12MB5417E664CA264D62E1F2EB9B8F112@PH0PR12MB5417.namprd12.prod.outlook.com"> <pre class="moz-quote-pre" wrap=""> Emily Deng Best Wishes </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">-----Original Message----- From: Deng, Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a> Sent: Tuesday, January 7, 2025 10:34 AM To: Deng, Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>; Kuehling, Felix <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>; Yang, Philip <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>; Koenig, Christian <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a> Subject: RE: [PATCH v2] drm/amdgpu: Fix the looply call svm_range_restore_pages issue [AMD Official Use Only - AMD Internal Distribution Only] Ping.... How about this? Currently it is easily to reproduce the issue on our environment. We need this change to fix it. Emily Deng Best Wishes </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">-----Original Message----- From: amd-gfx <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx-bounces@lists.freedesktop.org"><amd-gfx-bounces@lists.freedesktop.org></a> On Behalf Of Deng, Emily Sent: Monday, January 6, 2025 9:52 AM To: Kuehling, Felix <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>; Yang, Philip <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>; Koenig, Christian <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a> Subject: RE: [PATCH v2] drm/amdgpu: Fix the looply call svm_range_restore_pages issue [AMD Official Use Only - AMD Internal Distribution Only] [AMD Official Use Only - AMD Internal Distribution Only] </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">-----Original Message----- From: Kuehling, Felix <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a> Sent: Saturday, January 4, 2025 7:18 AM To: Deng, Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>; Yang, Philip <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>; Koenig, Christian <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a> Subject: Re: [PATCH v2] drm/amdgpu: Fix the looply call svm_range_restore_pages issue On 2025-01-02 21:26, Emily Deng wrote: </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">As the delayed free pt, the wanted freed bo has been reused which will cause unexpected page fault, and then call svm_range_restore_pages. Detail as below: 1.It wants to free the pt in follow code, but it is not freed immediately and used “schedule_work(&vm->pt_free_work);”. [ 92.276838] Call Trace: [ 92.276841] dump_stack+0x63/0xa0 [ 92.276887] amdgpu_vm_pt_free_list+0xfb/0x120 [amdgpu] [ 92.276932] amdgpu_vm_update_range+0x69c/0x8e0 [amdgpu] [ 92.276990] svm_range_unmap_from_gpus+0x112/0x310 [amdgpu] [ 92.277046] svm_range_cpu_invalidate_pagetables+0x725/0x780 [amdgpu] [ 92.277050] ? __alloc_pages_nodemask+0x19f/0x3e0 [ 92.277051] mn_itree_invalidate+0x72/0xc0 [ 92.277052] __mmu_notifier_invalidate_range_start+0x48/0x60 [ 92.277054] migrate_vma_collect+0xf6/0x100 [ 92.277055] migrate_vma_setup+0xcf/0x120 [ 92.277109] svm_migrate_ram_to_vram+0x256/0x6b0 [amdgpu] 2.Call svm_range_map_to_gpu->amdgpu_vm_update_range to update the page table, at this time it will use the same entry bo which is the want free bo in step1. 3.Then it executes the pt_free_work to free the bo. At this time, the GPU access memory will cause page fault as pt bo has been freed. And then it will call svm_range_restore_pages again. How to fix? Add a workqueue, and flush the workqueue each time before updating page </pre> </blockquote> </blockquote> </blockquote> <pre class="moz-quote-pre" wrap="">table. </pre> <blockquote type="cite"> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> I think this is kind of a known issue in the GPUVM code. Philip was looking at it before. Just flushing a workqueue may seem like a simple and elegant solution, but I'm afraid it introduces lock dependency issues. By flushing the workqueue, you're effectively creating a lock dependency of the MMU notifier on any locks held inside the worker function. You now get a circular lock dependency with any of those locks and memory reclaim. I think LOCKDEP would be able to catch that if you compile your kernel with that </pre> </blockquote> <pre class="moz-quote-pre" wrap="">feature enabled. </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> The proper solution is to prevent delayed freeing of page tables if they happened to get reused, or prevent reuse of page tables if they are flagged for </pre> </blockquote> <pre class="moz-quote-pre" wrap="">delayed freeing. </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> Regards, Felix </pre> </blockquote> <pre class="moz-quote-pre" wrap="">Thanks, already enabled LOCKDEP while compiling the kernel. The delay work seems for other reasons, I am not sure whether it could be deleted completely. Emily Deng Best Wishes </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> Signed-off-by: Emily Deng <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 6 +++++- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 +++ 5 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 93c352b08969..cbf68ad1c8d0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1188,6 +1188,7 @@ struct amdgpu_device { struct mutex enforce_isolation_mutex; struct amdgpu_init_level *init_lvl; + struct workqueue_struct *wq; }; static inline uint32_t amdgpu_ip_version(const struct amdgpu_device *adev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index f30548f4c3b3..5b4835bc81b3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -2069,6 +2069,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( if (ret) goto out; } + flush_workqueue(adev->wq); ret = reserve_bo_and_vm(mem, avm, &ctx); if (unlikely(ret)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 9d6ffe38b48a..500d97cd9114 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2607,7 +2607,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, </pre> </blockquote> <pre class="moz-quote-pre" wrap="">struct amdgpu_vm *vm) </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> amdgpu_amdkfd_gpuvm_destroy_cb(adev, vm); flush_work(&vm->pt_free_work); - + cancel_work_sync(&vm->pt_free_work); root = amdgpu_bo_ref(vm->root.bo); amdgpu_bo_reserve(root, true); amdgpu_vm_put_task_info(vm->task_info); @@ -2708,6 +2708,8 @@ void amdgpu_vm_manager_init(struct amdgpu_device </pre> </blockquote> <pre class="moz-quote-pre" wrap="">*adev) </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> #endif xa_init_flags(&adev->vm_manager.pasids, XA_FLAGS_LOCK_IRQ); + adev->wq = alloc_workqueue("amdgpu_recycle", + WQ_MEM_RECLAIM | WQ_HIGHPRI | </pre> </blockquote> <pre class="moz-quote-pre" wrap="">WQ_UNBOUND, 16); </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> } /** @@ -2721,7 +2723,8 @@ void amdgpu_vm_manager_fini(struct amdgpu_device </pre> </blockquote> <pre class="moz-quote-pre" wrap="">*adev) </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> { WARN_ON(!xa_empty(&adev->vm_manager.pasids)); xa_destroy(&adev->vm_manager.pasids); - + flush_workqueue(adev->wq); + destroy_workqueue(adev->wq); amdgpu_vmid_mgr_fini(adev); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index f78a0434a48f..1204406215ee 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -554,15 +554,19 @@ void amdgpu_vm_pt_free_work(struct work_struct *work) vm = container_of(work, struct amdgpu_vm, pt_free_work); + printk("Emily:%s\n", __func__); spin_lock(&vm->status_lock); list_splice_init(&vm->pt_freed, &pt_freed); spin_unlock(&vm->status_lock); + printk("Emily:%s 1\n", __func__); /* flush_work in amdgpu_vm_fini ensure vm->root.bo is valid. */ amdgpu_bo_reserve(vm->root.bo, true); + printk("Emily:%s 2\n", __func__); list_for_each_entry_safe(entry, next, &pt_freed, vm_status) amdgpu_vm_pt_free(entry); + printk("Emily:%s 3\n", __func__); amdgpu_bo_unreserve(vm->root.bo); } @@ -589,7 +593,7 @@ void amdgpu_vm_pt_free_list(struct amdgpu_device </pre> </blockquote> <pre class="moz-quote-pre" wrap="">*adev, </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> spin_lock(&vm->status_lock); list_splice_init(&params->tlb_flush_waitlist, &vm->pt_freed); spin_unlock(&vm->status_lock); - schedule_work(&vm->pt_free_work); + queue_work(adev->wq, &vm->pt_free_work); return; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 3e2911895c74..55edf96d5a95 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1314,6 +1314,7 @@ svm_range_unmap_from_gpu(struct amdgpu_device </pre> </blockquote> <pre class="moz-quote-pre" wrap="">*adev, struct amdgpu_vm *vm, </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> uint64_t init_pte_value = 0; pr_debug("[0x%llx 0x%llx]\n", start, last); + flush_workqueue(adev->wq); return amdgpu_vm_update_range(adev, vm, false, true, true, false, NULL, </pre> </blockquote> <pre class="moz-quote-pre" wrap="">start, </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> last, init_pte_value, 0, 0, NULL, NULL, @@ -1422,6 </pre> </blockquote> <pre class="moz-quote-pre" wrap="">+1423,8 </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap="">@@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range </pre> </blockquote> <pre class="moz-quote-pre" wrap="">*prange, </pre> <blockquote type="cite"> <pre class="moz-quote-pre" wrap=""> * different memory partition based on fpfn/lpfn, we should use * same vm_manager.vram_base_offset regardless memory partition. */ + flush_workqueue(adev->wq); + r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, true, NULL, last_start, prange->start + i, pte_flags, </pre> </blockquote> </blockquote> </blockquote> </blockquote> <pre class="moz-quote-pre" wrap=""> </pre> </blockquote> </body> </html>