[PATCH v7 2/2] drm/amdgpu: sync page table freeing with tlb flush

Mon Mar 18 14:40:51 UTC 2024

The change you shared with me fixes the crash. Pl include in v8.

On 3/18/2024 10:06 AM, Sharma, Shashank wrote:
>
> [AMD Official Use Only - General]
>
>
> Already sent a NULL check patch based on this backtrace, I am waiting 
> for Rajneesh's feedback.
>
> Regards
> Shashank
> ------------------------------------------------------------------------
> *From:* Bhardwaj, Rajneesh <Rajneesh.Bhardwaj at amd.com>
> *Sent:* Monday, March 18, 2024 3:04 PM
> *To:* Sharma, Shashank <Shashank.Sharma at amd.com>; 
> amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
> *Cc:* Koenig, Christian <Christian.Koenig at amd.com>; Deucher, Alexander 
> <Alexander.Deucher at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>
> *Subject:* Re: [PATCH v7 2/2] drm/amdgpu: sync page table freeing with 
> tlb flush
>
> HI Shashank
>
> We'll probably need a v8 with the null pointer crash fixed i.e. before 
> freeing the PT entries check for a valid entry before calling 
> amdgpu_vm_pt_free. The crash is seen with device memory allocators but 
> the system memory allocators are looking fine.
>
> [  127.255863] [drm] Using MTYPE_RW for local memory
> [  333.606136] hugetlbfs: test_with_MPI.e (25268): Using mlock ulimits 
> for SHM_HUGETLB is obsolete
> [  415.351447] BUG: kernel NULL pointer dereference, address: 
> 0000000000000008
> [  415.359245] #PF: supervisor write access in kernel mode
> [  415.365081] #PF: error_code(0x0002) - not-present page
> [  415.370817] PGD 101259067 P4D 101259067 PUD 10125a067 PMD 0
> [  415.377140] Oops: 0002 [#1] PREEMPT SMP NOPTI
> [  415.382004] CPU: 0 PID: 25481 Comm: test_with_MPI.e Tainted: 
> G           OE 5.18.2-mi300-build-140423-ubuntu-22.04+ #24
> [  415.394437] Hardware name: AMD Corporation Sh51p/Sh51p, BIOS 
> RMO1001AS 02/21/2024
> [  415.402797] RIP: 0010:amdgpu_vm_ptes_update+0x6fd/0xa10 [amdgpu]
> [  415.409648] Code: 4c 89 ff 4d 8d 66 30 e8 f1 ed ff ff 48 85 db 74 
> 42 48 39 5d a0 74 40 48 8b 53 20 48 8b 4b 18 48 8d 43 18 48 8d 75 b0 
> 4c 89 ff <48
> > 89 51 08 48 89 0a 49 8b 56 30 48 89 42 08 48 89 53 18 4c 89 63
> [  415.430621] RSP: 0018:ffffc9000401f990 EFLAGS: 00010287
> [  415.436456] RAX: ffff888147bb82f0 RBX: ffff888147bb82d8 RCX: 
> 0000000000000000
> [  415.444426] RDX: 0000000000000000 RSI: ffffc9000401fa30 RDI: 
> ffff888161f80000
> [  415.452397] RBP: ffffc9000401fa80 R08: 0000000000000000 R09: 
> ffffc9000401fa00
> [  415.460368] R10: 00000007f0cc0000 R11: 00000007f0c85000 R12: 
> ffffc9000401fb20
> [  415.468340] R13: 00000007f0d00000 R14: ffffc9000401faf0 R15: 
> ffff888161f80000
> [  415.476312] FS:  00007f132ff89840(0000) GS:ffff889f87c00000(0000) 
> knlGS:0000000000000000
> [  415.485350] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  415.491767] CR2: 0000000000000008 CR3: 0000000161d46003 CR4: 
> 0000000000770ef0
> [  415.499738] PKRU: 55555554
> [  415.502750] Call Trace:
> [  415.505482]  <TASK>
> [  415.507825]  amdgpu_vm_update_range+0x32a/0x880 [amdgpu]
> [  415.513869]  amdgpu_vm_clear_freed+0x117/0x250 [amdgpu]
> [  415.519814] amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu+0x18c/0x250 
> [amdgpu]
> [  415.527729]  kfd_ioctl_unmap_memory_from_gpu+0xed/0x340 [amdgpu]
> [  415.534551]  kfd_ioctl+0x3b6/0x510 [amdgpu]
> [  415.539324]  ? kfd_ioctl_get_dmabuf_info+0x1d0/0x1d0 [amdgpu]
> [  415.545844]  ? __fget_light+0xc5/0x100
> [  415.550037]  __x64_sys_ioctl+0x91/0xc0
> [  415.554227]  do_syscall_64+0x5c/0x80
> [  415.558223]  ? debug_smp_processor_id+0x17/0x20
> [  415.563285]  ? fpregs_assert_state_consistent+0x23/0x60
> [  415.569124]  ? exit_to_user_mode_prepare+0x45/0x190
> [  415.574572]  ? ksys_write+0xce/0xe0
>
>
> On 3/18/2024 8:08 AM, Shashank Sharma wrote:
>
>     The idea behind this patch is to delay the freeing of PT entry
>     objects until the TLB flush is done. This patch: - Adds a
>     tlb_flush_waitlist in amdgpu_vm_update_params which will keep the
>     objects that need to be freed after tlb_flush. - Adds PT entries
>     in this list in amdgpu_vm_ptes_update after finding the PT entry.
>     - Changes functionality of amdgpu_vm_pt_free_dfs from (df_search +
>     free) to simply freeing of the BOs, also renames it to
>     amdgpu_vm_pt_free_list to reflect this same. - Exports function
>     amdgpu_vm_pt_free_list to be called directly. - Calls
>     amdgpu_vm_pt_free_list directly from amdgpu_vm_update_range. V2:
>     rebase V4: Addressed review comments from Christian - add only
>     locked PTEs entries in TLB flush waitlist. - do not create a
>     separate function for list flush. - do not create a new lock for
>     TLB flush. - there is no need to wait on tlb_flush_fence
>     exclusively. V5: Addressed review comments from Christian - change
>     the amdgpu_vm_pt_free_dfs's functionality to simple freeing of the
>     objects and rename it. - add all the PTE objects in
>     params->tlb_flush_waitlist - let amdgpu_vm_pt_free_root handle the
>     freeing of BOs independently - call amdgpu_vm_pt_free_list
>     directly V6: Rebase V7: Rebase Cc: Christian König
>     <Christian.Koenig at amd.com> <mailto:Christian.Koenig at amd.com> Cc:
>     Alex Deucher <alexander.deucher at amd.com>
>     <mailto:alexander.deucher at amd.com> Cc: Felix Kuehling
>     <felix.kuehling at amd.com> <mailto:felix.kuehling at amd.com> Cc:
>     Rajneesh Bhardwaj <rajneesh.bhardwaj at amd.com>
>     <mailto:rajneesh.bhardwaj at amd.com> Acked-by: Felix Kuehling
>     <felix.kuehling at amd.com> <mailto:felix.kuehling at amd.com> Acked-by:
>     Rajneesh Bhardwaj <rajneesh.bhardwaj at amd.com>
>     <mailto:rajneesh.bhardwaj at amd.com> Tested-by: Rajneesh Bhardwaj
>     <rajneesh.bhardwaj at amd.com> <mailto:rajneesh.bhardwaj at amd.com>
>     Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>     <mailto:shashank.sharma at amd.com> ---
>     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 5 ++-
>     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 7 +++
>     drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 53
>     +++++++++++++---------- 3 files changed, 40 insertions(+), 25
>     deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index
>     26f1c3359642..eaa402f99fe0 100644 ---
>     a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++
>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -977,6 +977,7 @@ int
>     amdgpu_vm_update_range(struct amdgpu_device *adev, struct
>     amdgpu_vm *vm, params.unlocked = unlocked; params.needs_flush =
>     flush_tlb; params.allow_override = allow_override; +
>     INIT_LIST_HEAD(&params.tlb_flush_waitlist); /* Implicitly sync to
>     command submissions in the same VM before * unmapping. Sync to
>     moving fences before mapping. @@ -1062,8 +1063,10 @@ int
>     amdgpu_vm_update_range(struct amdgpu_device *adev, struct
>     amdgpu_vm *vm, if (r) goto error_unlock; - if (params.needs_flush)
>     + if (params.needs_flush) { r = amdgpu_vm_tlb_flush(&params,
>     fence); + amdgpu_vm_pt_free_list(adev, &params); + } error_unlock:
>     amdgpu_vm_eviction_unlock(vm); diff --git
>     a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index
>     b0a4fe683352..54d7da396de0 100644 ---
>     a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++
>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -266,6 +266,11 @@
>     struct amdgpu_vm_update_params { * to be overridden for NUMA local
>     memory. */ bool allow_override; + + /** + * @tlb_flush_waitlist:
>     temporary storage for BOs until tlb_flush + */ + struct list_head
>     tlb_flush_waitlist; }; struct amdgpu_vm_update_funcs { @@ -547,6
>     +552,8 @@ int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params
>     *params, uint64_t start, uint64_t end, uint64_t dst, uint64_t
>     flags); void amdgpu_vm_pt_free_work(struct work_struct *work);
>     +void amdgpu_vm_pt_free_list(struct amdgpu_device *adev, + struct
>     amdgpu_vm_update_params *params); #if defined(CONFIG_DEBUG_FS)
>     void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct
>     seq_file *m); diff --git
>     a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index
>     601df0ce8290..440dc8c581fc 100644 ---
>     a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++
>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -622,40 +622,30 @@
>     void amdgpu_vm_pt_free_work(struct work_struct *work) } /** - *
>     amdgpu_vm_pt_free_dfs - free PD/PT levels + *
>     amdgpu_vm_pt_free_list - free PD/PT levels * * @adev: amdgpu
>     device structure - * @vm: amdgpu vm structure - * @start: optional
>     cursor where to start freeing PDs/PTs - * @unlocked: vm resv
>     unlock status + * @params: see amdgpu_vm_update_params definition
>     * - * Free the page directory or page table level and all sub
>     levels. + * Free the page directory objects saved in the flush
>     list */ -static void amdgpu_vm_pt_free_dfs(struct amdgpu_device
>     *adev, - struct amdgpu_vm *vm, - struct amdgpu_vm_pt_cursor
>     *start, - bool unlocked) +void amdgpu_vm_pt_free_list(struct
>     amdgpu_device *adev, + struct amdgpu_vm_update_params *params) { -
>     struct amdgpu_vm_pt_cursor cursor; - struct amdgpu_vm_bo_base
>     *entry; + struct amdgpu_vm_bo_base *entry, *next; + struct
>     amdgpu_vm *vm = params->vm; + bool unlocked = params->unlocked; if
>     (unlocked) { spin_lock(&vm->status_lock); -
>     for_each_amdgpu_vm_pt_dfs_safe(adev, vm, start, cursor, entry) -
>     list_move(&entry->vm_status, &vm->pt_freed); - - if (start) -
>     list_move(&start->entry->vm_status, &vm->pt_freed); +
>     list_splice_init(&vm->pt_freed, &params->tlb_flush_waitlist);
>     spin_unlock(&vm->status_lock); schedule_work(&vm->pt_free_work);
>     return; } - for_each_amdgpu_vm_pt_dfs_safe(adev, vm, start,
>     cursor, entry) + list_for_each_entry_safe(entry, next,
>     &params->tlb_flush_waitlist, vm_status) amdgpu_vm_pt_free(entry);
>     - - if (start) - amdgpu_vm_pt_free(start->entry); } /** @@ -667,7
>     +657,11 @@ static void amdgpu_vm_pt_free_dfs(struct amdgpu_device
>     *adev, */ void amdgpu_vm_pt_free_root(struct amdgpu_device *adev,
>     struct amdgpu_vm *vm) { - amdgpu_vm_pt_free_dfs(adev, vm, NULL,
>     false); + struct amdgpu_vm_pt_cursor cursor; + struct
>     amdgpu_vm_bo_base *entry; + + for_each_amdgpu_vm_pt_dfs_safe(adev,
>     vm, NULL, cursor, entry) + amdgpu_vm_pt_free(entry); } /** @@
>     -972,10 +966,21 @@ int amdgpu_vm_ptes_update(struct
>     amdgpu_vm_update_params *params, while (cursor.pfn < frag_start) {
>     /* Make sure previous mapping is freed */ if (cursor.entry->bo) {
>     + struct amdgpu_vm_pt_cursor seek; + struct amdgpu_vm_bo_base
>     *entry; + params->needs_flush = true; -
>     amdgpu_vm_pt_free_dfs(adev, params->vm, - &cursor, -
>     params->unlocked); + spin_lock(&params->vm->status_lock); +
>     for_each_amdgpu_vm_pt_dfs_safe(adev, params->vm, &cursor, + seek,
>     entry) { + list_move(&entry->vm_status, +
>     &params->tlb_flush_waitlist); + } + + /* enter start node now */ +
>     list_move(&cursor.entry->vm_status, +
>     &params->tlb_flush_waitlist); +
>     spin_unlock(&params->vm->status_lock); } amdgpu_vm_pt_next(adev,
>     &cursor); }
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20240318/12f98002/attachment-0001.htm>