<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2025-01-07 07:30, Deng, Emily wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:PH0PR12MB5417E664CA264D62E1F2EB9B8F112@PH0PR12MB5417.namprd12.prod.outlook.com">
      <pre class="moz-quote-pre" wrap="">[AMD Official Use Only - AMD Internal Distribution Only]

Hi Felix,
    You are right, it is easily to hit deadlock, don't know why LOCKDEP doesn't catch this. Need to find another solution.

Hi Philip,
     Do you have a solution for this delay free pt?
</pre>
    </blockquote>
    <p>Thanks for debugging this case, I had a patch to not free PTB bo
      when unmapping from GPU, but it will waste VRAM memory. My test
      case also passed with the tlb flush fence fix, I don't see the
      no-retry fault generated any more.<br>
    </p>
    <p>The deadlock is probably from svm_range_unmap_from_gpu ->
      flush_workqueue(adev->wq), this is from mmu notifier callback,
      actually we only need flush pt_free_work before mapping to gpu,
      please remove the flush_workqueue in unmap from gpu. If deadlock
      still happens, please post the backtrace.</p>
    <p>I think you don't need add new adev->wq, use default system_wq
      and flush_work.</p>
    <p>Regards,</p>
    <p>Philip<br>
    </p>
    <blockquote type="cite" cite="mid:PH0PR12MB5417E664CA264D62E1F2EB9B8F112@PH0PR12MB5417.namprd12.prod.outlook.com">
      <pre class="moz-quote-pre" wrap="">
Emily Deng
Best Wishes

</pre>
      <blockquote type="cite">
        <pre class="moz-quote-pre" wrap="">-----Original Message-----
From: Deng, Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>
Sent: Tuesday, January 7, 2025 10:34 AM
To: Deng, Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>; Kuehling, Felix
<a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>; Yang, Philip
<a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>; Koenig, Christian <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a>
Subject: RE: [PATCH v2] drm/amdgpu: Fix the looply call svm_range_restore_pages
issue

[AMD Official Use Only - AMD Internal Distribution Only]

Ping....
How about this? Currently it is easily to reproduce the issue on our environment. We
need this change to fix it.

Emily Deng
Best Wishes



</pre>
        <blockquote type="cite">
          <pre class="moz-quote-pre" wrap="">-----Original Message-----
From: amd-gfx <a class="moz-txt-link-rfc2396E" href="mailto:amd-gfx-bounces@lists.freedesktop.org"><amd-gfx-bounces@lists.freedesktop.org></a> On Behalf Of
Deng, Emily
Sent: Monday, January 6, 2025 9:52 AM
To: Kuehling, Felix <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a>;
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>; Yang, Philip <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>;
Koenig, Christian <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a>
Subject: RE: [PATCH v2] drm/amdgpu: Fix the looply call
svm_range_restore_pages issue

[AMD Official Use Only - AMD Internal Distribution Only]

[AMD Official Use Only - AMD Internal Distribution Only]

</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">-----Original Message-----
From: Kuehling, Felix <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a>
Sent: Saturday, January 4, 2025 7:18 AM
To: Deng, Emily <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>;
Yang, Philip <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>; Koenig, Christian
<a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a>
Subject: Re: [PATCH v2] drm/amdgpu: Fix the looply call
svm_range_restore_pages issue


On 2025-01-02 21:26, Emily Deng wrote:
</pre>
            <blockquote type="cite">
              <pre class="moz-quote-pre" wrap="">As the delayed free pt, the wanted freed bo has been reused which
will cause unexpected page fault, and then call svm_range_restore_pages.

Detail as below:
1.It wants to free the pt in follow code, but it is not freed
immediately and used “schedule_work(&vm->pt_free_work);”.

[   92.276838] Call Trace:
[   92.276841]  dump_stack+0x63/0xa0
[   92.276887]  amdgpu_vm_pt_free_list+0xfb/0x120 [amdgpu]
[   92.276932]  amdgpu_vm_update_range+0x69c/0x8e0 [amdgpu]
[   92.276990]  svm_range_unmap_from_gpus+0x112/0x310 [amdgpu]
[   92.277046]  svm_range_cpu_invalidate_pagetables+0x725/0x780 [amdgpu]
[   92.277050]  ? __alloc_pages_nodemask+0x19f/0x3e0
[   92.277051]  mn_itree_invalidate+0x72/0xc0
[   92.277052]  __mmu_notifier_invalidate_range_start+0x48/0x60
[   92.277054]  migrate_vma_collect+0xf6/0x100
[   92.277055]  migrate_vma_setup+0xcf/0x120
[   92.277109]  svm_migrate_ram_to_vram+0x256/0x6b0 [amdgpu]

2.Call svm_range_map_to_gpu->amdgpu_vm_update_range to update the
page table, at this time it will use the same entry bo which is the
want free bo in step1.

3.Then it executes the pt_free_work to free the bo. At this time,
the GPU access memory will cause page fault as pt bo has been freed.
And then it will call svm_range_restore_pages again.

How to fix?
Add a workqueue, and flush the workqueue each time before updating page
</pre>
            </blockquote>
          </blockquote>
        </blockquote>
        <pre class="moz-quote-pre" wrap="">table.
</pre>
        <blockquote type="cite">
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">
I think this is kind of a known issue in the GPUVM code. Philip was
looking at it before.

Just flushing a workqueue may seem like a simple and elegant solution,
but I'm afraid it introduces lock dependency issues. By flushing the
workqueue, you're effectively creating a lock dependency of the MMU
notifier on any locks held inside the worker function. You now get a
circular lock dependency with any of those locks and memory reclaim. I
think LOCKDEP would be able to catch that if you compile your kernel
with that
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">feature enabled.
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">
The proper solution is to prevent delayed freeing of page tables if
they happened to get reused, or prevent reuse of page tables if they
are flagged for
</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">delayed freeing.
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">
Regards,
  Felix

</pre>
          </blockquote>
          <pre class="moz-quote-pre" wrap="">Thanks, already enabled LOCKDEP while compiling the kernel. The delay
work seems for other reasons, I am not sure whether it could be deleted completely.

Emily Deng
Best Wishes
</pre>
          <blockquote type="cite">
            <pre class="moz-quote-pre" wrap="">
</pre>
            <blockquote type="cite">
              <pre class="moz-quote-pre" wrap="">
Signed-off-by: Emily Deng <a class="moz-txt-link-rfc2396E" href="mailto:Emily.Deng@amd.com"><Emily.Deng@amd.com></a>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h              | 1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c           | 7 +++++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c        | 6 +++++-
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c             | 3 +++
  5 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 93c352b08969..cbf68ad1c8d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1188,6 +1188,7 @@ struct amdgpu_device {
     struct mutex                    enforce_isolation_mutex;

     struct amdgpu_init_level *init_lvl;
+    struct workqueue_struct *wq;
  };

  static inline uint32_t amdgpu_ip_version(const struct
amdgpu_device *adev, diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index f30548f4c3b3..5b4835bc81b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2069,6 +2069,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
             if (ret)
                     goto out;
     }
+    flush_workqueue(adev->wq);

     ret = reserve_bo_and_vm(mem, avm, &ctx);
     if (unlikely(ret))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 9d6ffe38b48a..500d97cd9114 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2607,7 +2607,7 @@ void amdgpu_vm_fini(struct amdgpu_device
*adev,
</pre>
            </blockquote>
            <pre class="moz-quote-pre" wrap="">struct amdgpu_vm *vm)
</pre>
            <blockquote type="cite">
              <pre class="moz-quote-pre" wrap="">     amdgpu_amdkfd_gpuvm_destroy_cb(adev, vm);

     flush_work(&vm->pt_free_work);
-
+    cancel_work_sync(&vm->pt_free_work);
     root = amdgpu_bo_ref(vm->root.bo);
     amdgpu_bo_reserve(root, true);
     amdgpu_vm_put_task_info(vm->task_info);
@@ -2708,6 +2708,8 @@ void amdgpu_vm_manager_init(struct
amdgpu_device
</pre>
            </blockquote>
            <pre class="moz-quote-pre" wrap="">*adev)
</pre>
            <blockquote type="cite">
              <pre class="moz-quote-pre" wrap="">  #endif

     xa_init_flags(&adev->vm_manager.pasids, XA_FLAGS_LOCK_IRQ);
+    adev->wq = alloc_workqueue("amdgpu_recycle",
+                               WQ_MEM_RECLAIM | WQ_HIGHPRI |
</pre>
            </blockquote>
            <pre class="moz-quote-pre" wrap="">WQ_UNBOUND, 16);
</pre>
            <blockquote type="cite">
              <pre class="moz-quote-pre" wrap="">  }

  /**
@@ -2721,7 +2723,8 @@ void amdgpu_vm_manager_fini(struct
amdgpu_device
</pre>
            </blockquote>
            <pre class="moz-quote-pre" wrap="">*adev)
</pre>
            <blockquote type="cite">
              <pre class="moz-quote-pre" wrap="">  {
     WARN_ON(!xa_empty(&adev->vm_manager.pasids));
     xa_destroy(&adev->vm_manager.pasids);
-
+    flush_workqueue(adev->wq);
+    destroy_workqueue(adev->wq);
     amdgpu_vmid_mgr_fini(adev);
  }

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index f78a0434a48f..1204406215ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -554,15 +554,19 @@ void amdgpu_vm_pt_free_work(struct work_struct
*work)

     vm = container_of(work, struct amdgpu_vm, pt_free_work);

+    printk("Emily:%s\n", __func__);
     spin_lock(&vm->status_lock);
     list_splice_init(&vm->pt_freed, &pt_freed);
     spin_unlock(&vm->status_lock);
+    printk("Emily:%s 1\n", __func__);

     /* flush_work in amdgpu_vm_fini ensure vm->root.bo is valid. */
     amdgpu_bo_reserve(vm->root.bo, true);
+    printk("Emily:%s 2\n", __func__);

     list_for_each_entry_safe(entry, next, &pt_freed, vm_status)
             amdgpu_vm_pt_free(entry);
+    printk("Emily:%s 3\n", __func__);

     amdgpu_bo_unreserve(vm->root.bo);
  }
@@ -589,7 +593,7 @@ void amdgpu_vm_pt_free_list(struct amdgpu_device
</pre>
            </blockquote>
            <pre class="moz-quote-pre" wrap="">*adev,
</pre>
            <blockquote type="cite">
              <pre class="moz-quote-pre" wrap="">             spin_lock(&vm->status_lock);
             list_splice_init(&params->tlb_flush_waitlist, &vm->pt_freed);
             spin_unlock(&vm->status_lock);
-            schedule_work(&vm->pt_free_work);
+            queue_work(adev->wq, &vm->pt_free_work);
             return;
     }

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 3e2911895c74..55edf96d5a95 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1314,6 +1314,7 @@ svm_range_unmap_from_gpu(struct amdgpu_device
</pre>
            </blockquote>
            <pre class="moz-quote-pre" wrap="">*adev, struct amdgpu_vm *vm,
</pre>
            <blockquote type="cite">
              <pre class="moz-quote-pre" wrap="">     uint64_t init_pte_value = 0;

     pr_debug("[0x%llx 0x%llx]\n", start, last);
+    flush_workqueue(adev->wq);

     return amdgpu_vm_update_range(adev, vm, false, true, true,
false, NULL,
</pre>
            </blockquote>
            <pre class="moz-quote-pre" wrap="">start,
</pre>
            <blockquote type="cite">
              <pre class="moz-quote-pre" wrap="">                                   last, init_pte_value, 0, 0, NULL,
NULL, @@ -1422,6
</pre>
            </blockquote>
            <pre class="moz-quote-pre" wrap="">+1423,8
</pre>
            <blockquote type="cite">
              <pre class="moz-quote-pre" wrap="">@@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct
svm_range
</pre>
            </blockquote>
            <pre class="moz-quote-pre" wrap="">*prange,
</pre>
            <blockquote type="cite">
              <pre class="moz-quote-pre" wrap="">              * different memory partition based on fpfn/lpfn, we should use
              * same vm_manager.vram_base_offset regardless memory partition.
              */
+            flush_workqueue(adev->wq);
+
             r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, true,
                                        NULL, last_start, prange->start + i,
                                        pte_flags,
</pre>
            </blockquote>
          </blockquote>
        </blockquote>
      </blockquote>
      <pre class="moz-quote-pre" wrap="">
</pre>
    </blockquote>
  </body>
</html>