<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 18/03/2024 16:24, Christian König
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:c85e5897-66cf-46b8-aa8f-c208c4a6c68a@amd.com">Am
      18.03.24 um 16:22 schrieb Sharma, Shashank:
      <br>
      <blockquote type="cite">
        <br>
        On 18/03/2024 16:01, Christian König wrote:
        <br>
        <blockquote type="cite">Am 18.03.24 um 15:44 schrieb Shashank
          Sharma:
          <br>
          <blockquote type="cite">The idea behind this patch is to delay
            the freeing of PT entry objects
            <br>
            until the TLB flush is done.
            <br>
            <br>
            This patch:
            <br>
            - Adds a tlb_flush_waitlist in amdgpu_vm_update_params which
            will keep the
            <br>
               objects that need to be freed after tlb_flush.
            <br>
            - Adds PT entries in this list in amdgpu_vm_ptes_update
            after finding
            <br>
               the PT entry.
            <br>
            - Changes functionality of amdgpu_vm_pt_free_dfs from
            (df_search + free)
            <br>
               to simply freeing of the BOs, also renames it to
            <br>
               amdgpu_vm_pt_free_list to reflect this same.
            <br>
            - Exports function amdgpu_vm_pt_free_list to be called
            directly.
            <br>
            - Calls amdgpu_vm_pt_free_list directly from
            amdgpu_vm_update_range.
            <br>
            <br>
            V2: rebase
            <br>
            V4: Addressed review comments from Christian
            <br>
                 - add only locked PTEs entries in TLB flush waitlist.
            <br>
                 - do not create a separate function for list flush.
            <br>
                 - do not create a new lock for TLB flush.
            <br>
                 - there is no need to wait on tlb_flush_fence
            exclusively.
            <br>
            <br>
            V5: Addressed review comments from Christian
            <br>
                 - change the amdgpu_vm_pt_free_dfs's functionality to
            simple freeing
            <br>
                   of the objects and rename it.
            <br>
                 - add all the PTE objects in
            params->tlb_flush_waitlist
            <br>
                 - let amdgpu_vm_pt_free_root handle the freeing of BOs
            independently
            <br>
                 - call amdgpu_vm_pt_free_list directly
            <br>
            <br>
            V6: Rebase
            <br>
            V7: Rebase
            <br>
            V8: Added a NULL check to fix this backtrace issue:
            <br>
            [  415.351447] BUG: kernel NULL pointer dereference,
            address: 0000000000000008
            <br>
            [  415.359245] #PF: supervisor write access in kernel mode
            <br>
            [  415.365081] #PF: error_code(0x0002) - not-present page
            <br>
            [  415.370817] PGD 101259067 P4D 101259067 PUD 10125a067 PMD
            0
            <br>
            [  415.377140] Oops: 0002 [#1] PREEMPT SMP NOPTI
            <br>
            [  415.382004] CPU: 0 PID: 25481 Comm: test_with_MPI.e
            Tainted: G           OE
            5.18.2-mi300-build-140423-ubuntu-22.04+ #24
            <br>
            [  415.394437] Hardware name: AMD Corporation Sh51p/Sh51p,
            BIOS RMO1001AS 02/21/2024
            <br>
            [  415.402797] RIP: 0010:amdgpu_vm_ptes_update+0x6fd/0xa10
            [amdgpu]
            <br>
            [  415.409648] Code: 4c 89 ff 4d 8d 66 30 e8 f1 ed ff ff 48
            85 db 74 42 48 39 5d a0 74 40 48 8b 53 20 48 8b 4b 18 48 8d
            43 18 48 8d 75 b0 4c 89 ff <48
            <br>
            <blockquote type="cite">89 51 08 48 89 0a 49 8b 56 30 48 89
              42 08 48 89 53 18 4c 89 63
              <br>
            </blockquote>
            [  415.430621] RSP: 0018:ffffc9000401f990 EFLAGS: 00010287
            <br>
            [  415.436456] RAX: ffff888147bb82f0 RBX: ffff888147bb82d8
            RCX: 0000000000000000
            <br>
            [  415.444426] RDX: 0000000000000000 RSI: ffffc9000401fa30
            RDI: ffff888161f80000
            <br>
            [  415.452397] RBP: ffffc9000401fa80 R08: 0000000000000000
            R09: ffffc9000401fa00
            <br>
            [  415.460368] R10: 00000007f0cc0000 R11: 00000007f0c85000
            R12: ffffc9000401fb20
            <br>
            [  415.468340] R13: 00000007f0d00000 R14: ffffc9000401faf0
            R15: ffff888161f80000
            <br>
            [  415.476312] FS:  00007f132ff89840(0000)
            GS:ffff889f87c00000(0000) knlGS:0000000000000000
            <br>
            [  415.485350] CS:  0010 DS: 0000 ES: 0000 CR0:
            0000000080050033
            <br>
            [  415.491767] CR2: 0000000000000008 CR3: 0000000161d46003
            CR4: 0000000000770ef0
            <br>
            [  415.499738] PKRU: 55555554
            <br>
            [  415.502750] Call Trace:
            <br>
            [  415.505482]  <TASK>
            <br>
            [  415.507825]  amdgpu_vm_update_range+0x32a/0x880 [amdgpu]
            <br>
            [  415.513869]  amdgpu_vm_clear_freed+0x117/0x250 [amdgpu]
            <br>
            [  415.519814]
            amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x18c/0x250
            [amdgpu]
            <br>
            [  415.527729]  kfd_ioctl_unmap_memory_from_gpu+0xed/0x340
            [amdgpu]
            <br>
            [  415.534551]  kfd_ioctl+0x3b6/0x510 [amdgpu]
            <br>
            <br>
            Cc: Christian König <a class="moz-txt-link-rfc2396E" href="mailto:Christian.Koenig@amd.com"><Christian.Koenig@amd.com></a>
            <br>
            Cc: Alex Deucher <a class="moz-txt-link-rfc2396E" href="mailto:alexander.deucher@amd.com"><alexander.deucher@amd.com></a>
            <br>
            Cc: Felix Kuehling <a class="moz-txt-link-rfc2396E" href="mailto:felix.kuehling@amd.com"><felix.kuehling@amd.com></a>
            <br>
            Cc: Rajneesh Bhardwaj <a class="moz-txt-link-rfc2396E" href="mailto:rajneesh.bhardwaj@amd.com"><rajneesh.bhardwaj@amd.com></a>
            <br>
            Acked-by: Felix Kuehling <a class="moz-txt-link-rfc2396E" href="mailto:felix.kuehling@amd.com"><felix.kuehling@amd.com></a>
            <br>
            Acked-by: Rajneesh Bhardwaj
            <a class="moz-txt-link-rfc2396E" href="mailto:rajneesh.bhardwaj@amd.com"><rajneesh.bhardwaj@amd.com></a>
            <br>
            Tested-by: Rajneesh Bhardwaj
            <a class="moz-txt-link-rfc2396E" href="mailto:rajneesh.bhardwaj@amd.com"><rajneesh.bhardwaj@amd.com></a>
            <br>
            Signed-off-by: Shashank Sharma
            <a class="moz-txt-link-rfc2396E" href="mailto:shashank.sharma@amd.com"><shashank.sharma@amd.com></a>
            <br>
            ---
            <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  5 +-
            <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |  7 +++
            <br>
              drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 58
            +++++++++++++----------
            <br>
              3 files changed, 45 insertions(+), 25 deletions(-)
            <br>
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
            <br>
            index 26f1c3359642..eaa402f99fe0 100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
            <br>
            @@ -977,6 +977,7 @@ int amdgpu_vm_update_range(struct
            amdgpu_device *adev, struct amdgpu_vm *vm,
            <br>
                  params.unlocked = unlocked;
            <br>
                  params.needs_flush = flush_tlb;
            <br>
                  params.allow_override = allow_override;
            <br>
            +    INIT_LIST_HEAD(&params.tlb_flush_waitlist);
            <br>
                    /* Implicitly sync to command submissions in the
            same VM before
            <br>
                   * unmapping. Sync to moving fences before mapping.
            <br>
            @@ -1062,8 +1063,10 @@ int amdgpu_vm_update_range(struct
            amdgpu_device *adev, struct amdgpu_vm *vm,
            <br>
                  if (r)
            <br>
                      goto error_unlock;
            <br>
              -    if (params.needs_flush)
            <br>
            +    if (params.needs_flush) {
            <br>
                      r = amdgpu_vm_tlb_flush(&params, fence);
            <br>
            +        amdgpu_vm_pt_free_list(adev, &params);
            <br>
          </blockquote>
          <br>
          This is completed independent of the VM flush and should
          always be called.
          <br>
          <br>
          <blockquote type="cite">+    }
            <br>
                error_unlock:
            <br>
                  amdgpu_vm_eviction_unlock(vm);
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
            <br>
            index b0a4fe683352..54d7da396de0 100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
            <br>
            @@ -266,6 +266,11 @@ struct amdgpu_vm_update_params {
            <br>
                   * to be overridden for NUMA local memory.
            <br>
                   */
            <br>
                  bool allow_override;
            <br>
            +
            <br>
            +    /**
            <br>
            +     * @tlb_flush_waitlist: temporary storage for BOs until
            tlb_flush
            <br>
            +     */
            <br>
            +    struct list_head tlb_flush_waitlist;
            <br>
              };
            <br>
                struct amdgpu_vm_update_funcs {
            <br>
            @@ -547,6 +552,8 @@ int amdgpu_vm_ptes_update(struct
            amdgpu_vm_update_params *params,
            <br>
                            uint64_t start, uint64_t end,
            <br>
                            uint64_t dst, uint64_t flags);
            <br>
              void amdgpu_vm_pt_free_work(struct work_struct *work);
            <br>
            +void amdgpu_vm_pt_free_list(struct amdgpu_device *adev,
            <br>
            +                struct amdgpu_vm_update_params *params);
            <br>
                #if defined(CONFIG_DEBUG_FS)
            <br>
              void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm,
            struct seq_file *m);
            <br>
            diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
            <br>
            index 601df0ce8290..9231edfb427e 100644
            <br>
            --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
            <br>
            +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
            <br>
            @@ -622,40 +622,30 @@ void amdgpu_vm_pt_free_work(struct
            work_struct *work)
            <br>
              }
            <br>
                /**
            <br>
            - * amdgpu_vm_pt_free_dfs - free PD/PT levels
            <br>
            + * amdgpu_vm_pt_free_list - free PD/PT levels
            <br>
               *
            <br>
               * @adev: amdgpu device structure
            <br>
            - * @vm: amdgpu vm structure
            <br>
            - * @start: optional cursor where to start freeing PDs/PTs
            <br>
            - * @unlocked: vm resv unlock status
            <br>
            + * @params: see amdgpu_vm_update_params definition
            <br>
               *
            <br>
            - * Free the page directory or page table level and all sub
            levels.
            <br>
            + * Free the page directory objects saved in the flush list
            <br>
               */
            <br>
            -static void amdgpu_vm_pt_free_dfs(struct amdgpu_device
            *adev,
            <br>
            -                  struct amdgpu_vm *vm,
            <br>
            -                  struct amdgpu_vm_pt_cursor *start,
            <br>
            -                  bool unlocked)
            <br>
            +void amdgpu_vm_pt_free_list(struct amdgpu_device *adev,
            <br>
            +                struct amdgpu_vm_update_params *params)
            <br>
              {
            <br>
            -    struct amdgpu_vm_pt_cursor cursor;
            <br>
            -    struct amdgpu_vm_bo_base *entry;
            <br>
            +    struct amdgpu_vm_bo_base *entry, *next;
            <br>
            +    struct amdgpu_vm *vm = params->vm;
            <br>
            +    bool unlocked = params->unlocked;
            <br>
                    if (unlocked) {
            <br>
                      spin_lock(&vm->status_lock);
            <br>
            -        for_each_amdgpu_vm_pt_dfs_safe(adev, vm, start,
            cursor, entry)
            <br>
            -            list_move(&entry->vm_status,
            &vm->pt_freed);
            <br>
            -
            <br>
            -        if (start)
            <br>
            -            list_move(&start->entry->vm_status,
            &vm->pt_freed);
            <br>
            +        list_splice_init(&vm->pt_freed,
            &params->tlb_flush_waitlist);
            <br>
                      spin_unlock(&vm->status_lock);
            <br>
                      schedule_work(&vm->pt_free_work);
            <br>
                      return;
            <br>
                  }
            <br>
              -    for_each_amdgpu_vm_pt_dfs_safe(adev, vm, start,
            cursor, entry)
            <br>
            +    list_for_each_entry_safe(entry, next,
            &params->tlb_flush_waitlist, vm_status)
            <br>
                      amdgpu_vm_pt_free(entry);
            <br>
            -
            <br>
            -    if (start)
            <br>
            -        amdgpu_vm_pt_free(start->entry);
            <br>
              }
            <br>
                /**
            <br>
            @@ -667,7 +657,13 @@ static void
            amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev,
            <br>
               */
            <br>
              void amdgpu_vm_pt_free_root(struct amdgpu_device *adev,
            struct amdgpu_vm *vm)
            <br>
              {
            <br>
            -    amdgpu_vm_pt_free_dfs(adev, vm, NULL, false);
            <br>
            +    struct amdgpu_vm_pt_cursor cursor;
            <br>
            +    struct amdgpu_vm_bo_base *entry;
            <br>
            +
            <br>
            +    for_each_amdgpu_vm_pt_dfs_safe(adev, vm, NULL, cursor,
            entry) {
            <br>
            +        if (entry)
            <br>
          </blockquote>
          <br>
          Entry can't be NULL here I think.
          <br>
        </blockquote>
        Yeah, makes sense, if there is a GPU process, it will have
        at-least one page table entry.
        <br>
      </blockquote>
      <br>
      The problem is rather that entry is either the root entry or a
      child element within the array of entries.
      <br>
      <br>
      So entry can never be NULL, only entry->bo can be NULL if there
      is no PD/PT allocated for this slot.
      <br>
    </blockquote>
    <p>It looks like it can be NULL, Rajneesh has just shared a
      backtrace from the testing, which shows a NULL here: <br>
    </p>
    <p><span><span class="ui-provider a b c d e f g h i j k l m n o p q
          r s t u v w x y z ab ac ae af ag ah ai aj ak" dir="ltr">[Mar21
          06:55] BUG: unable to handle page fault for address:
          ffffc9002d637aa0<br>
          [  +0.007689] #PF: supervisor write access in kernel mode<br>
          [  +0.005833] #PF: error_code(0x0002) - not-present page<br>
          [  +0.005732] PGD 100000067 P4D 100000067 PUD 1001ec067 PMD
          4882af067 PTE 0<br>
          [  +0.007579] Oops: 0002 [#1] PREEMPT SMP NOPTI<br>
          [  +0.004861] CPU: 52 PID: 8146 Comm: kworker/52:2 Tainted:
          G           OE     5.18.2-mi300-build-140423-ubuntu-22.04+ #24<br>
          [  +0.012135] Hardware name: AMD Corporation Sh54p/Sh54p, BIOS
          WPP4311S 03/11/2024<br>
          [  +0.008254] Workqueue: events delayed_fput<br>
          [  +0.004573] RIP: 0010:amdgpu_vm_pt_free+0x66/0xe0 [amdgpu]<br>
          [  +0.006270] Code: 01 74 6e 48 c7 45 e8 00 00 00 00 31 f6 48
          83 c7 58 e8 0e ea 3b ff 48 8b 03 48 8d 78 38 e8 f2 9b 90 c0 48
          8b 43 20 48 8b 53 18 <48> 89 42 08 48 89 10 48 b8 00 01
          00 00 00 00 ad de 48 89 43 18 48<br>
          [  +0.020954] RSP: 0018:ffffc9002e117c08 EFLAGS: 00010246<br>
          [  +0.005830] RAX: ffff8884867bda20 RBX: ffff8884867bd9a8 RCX:
          0000000000000000<br>
          [  +0.007961] RDX: ffffc9002d637a98 RSI: ffff888482845458 RDI:
          ffffffffc155916e<br>
          [  +0.007958] RBP: ffffc9002e117c20 R08: 0000000000000000 R09:
          0000000000000001<br>
          [  +0.007961] R10: ffff888482843000 R11: 0000000141eed000 R12:
          ffff8884867bd9a8<br>
          [  +0.007959] R13: ffff888471d68098 R14: ffff888471d68098 R15:
          00000000c1dab300<br>
          [  +0.007960] FS:  0000000000000000(0000)
          GS:ffff88e1cf700000(0000) knlGS:0000000000000000<br>
          [  +0.009027] CS:  0010 DS: 0000 ES: 0000 CR0:
          0000000080050033<br>
          [  +0.006409] CR2: ffffc9002d637aa0 CR3: 0000000006410006 CR4:
          0000000000770ee0<br>
          [  +0.007961] PKRU: 55555554<br>
          [  +0.003016] Call Trace:<br>
          [  +0.002726]  <TASK><br>
          [  +0.002340]  amdgpu_vm_pt_free_root+0x60/0xa0 [amdgpu]<br>
          [  +0.005843]  amdgpu_vm_fini+0x2cb/0x5d0 [amdgpu]<br>
          [  +0.005248]  ? amdgpu_ctx_mgr_entity_fini+0x53/0x1c0
          [amdgpu]<br>
          [  +0.006520]  amdgpu_driver_postclose_kms+0x191/0x2d0
          [amdgpu]<br>
          [  +0.006520]  drm_file_free.part.0+0x1e5/0x260 [drm]</span></span></p>
    <p>I might have to add a follow-up patch here. <br>
    </p>
    <p>- Shashank<br>
    </p>
    <blockquote type="cite" cite="mid:c85e5897-66cf-46b8-aa8f-c208c4a6c68a@amd.com">
      <br>
      Regards,
      <br>
      Christian.
      <br>
      <br>
      <blockquote type="cite">
        <blockquote type="cite">
          <br>
          <blockquote type="cite">+            amdgpu_vm_pt_free(entry);
            <br>
            +    }
            <br>
              }
            <br>
                /**
            <br>
            @@ -972,10 +968,24 @@ int amdgpu_vm_ptes_update(struct
            amdgpu_vm_update_params *params,
            <br>
                          while (cursor.pfn < frag_start) {
            <br>
                              /* Make sure previous mapping is freed */
            <br>
                              if (cursor.entry->bo) {
            <br>
            +                    struct amdgpu_vm_pt_cursor seek;
            <br>
            +                    struct amdgpu_vm_bo_base *entry;
            <br>
            +
            <br>
                                  params->needs_flush = true;
            <br>
            -                    amdgpu_vm_pt_free_dfs(adev,
            params->vm,
            <br>
            -                                  &cursor,
            <br>
            -                                  params->unlocked);
            <br>
            + spin_lock(&params->vm->status_lock);
            <br>
            +                    for_each_amdgpu_vm_pt_dfs_safe(adev,
            params->vm, &cursor,
            <br>
            +                                       seek, entry) {
            <br>
            +                        if (!entry || !entry->bo)
            <br>
            +                            continue;
            <br>
            +
            <br>
            +                        list_move(&entry->vm_status,
            <br>
            + &params->tlb_flush_waitlist);
            <br>
            +                    }
            <br>
            +
            <br>
            +                    /* enter start node now */
            <br>
            + list_move(&cursor.entry->vm_status,
            <br>
            + &params->tlb_flush_waitlist);
            <br>
            + spin_unlock(&params->vm->status_lock);
            <br>
          </blockquote>
          <br>
          I would put this into a separate function maybe.
          <br>
        </blockquote>
        <br>
        Noted,
        <br>
        <br>
        - Shashank
        <br>
        <br>
        <blockquote type="cite">
          <br>
          Apart from those nit-picks looks good to me.
          <br>
          <br>
          Regards,
          <br>
          Christian.
          <br>
          <br>
          <blockquote type="cite">                  }
            <br>
                              amdgpu_vm_pt_next(adev, &cursor);
            <br>
                          }
            <br>
          </blockquote>
          <br>
        </blockquote>
      </blockquote>
      <br>
    </blockquote>
  </body>
</html>