[PATCH v2] drm/amdkfd: make sure VM is ready for updating operations

Yu, Lang Lang.Yu at amd.com
Tue Apr 16 02:05:04 UTC 2024


[Public]

ping

>-----Original Message-----
>From: Yu, Lang <Lang.Yu at amd.com>
>Sent: Thursday, April 11, 2024 4:11 PM
>To: amd-gfx at lists.freedesktop.org
>Cc: Koenig, Christian <Christian.Koenig at amd.com>; Kuehling, Felix
><Felix.Kuehling at amd.com>; Yu, Lang <Lang.Yu at amd.com>
>Subject: [PATCH v2] drm/amdkfd: make sure VM is ready for updating
>operations
>
>When page table BOs were evicted but not validated before updating page
>tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY
>and restore_process_worker runs into a dead loop.
>
>v2: Split the BO validation and page table update into two separate loops in
>amdgpu_amdkfd_restore_process_bos. (Felix)
>  1.Validate BOs
>  2.Validate VM (and DMABuf attachments)
>  3.Update page tables for the BOs validated above
>
>Fixes: 2fdba514ad5a ("drm/amdgpu: Auto-validate DMABuf imports in
>compute VMs")
>
>Signed-off-by: Lang Yu <Lang.Yu at amd.com>
>---
> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 34 +++++++++++----
>----
> 1 file changed, 20 insertions(+), 14 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>index 0ae9fd844623..e2c9e6ddb1d1 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>@@ -2900,13 +2900,12 @@ int
>amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>__rcu *
>
>       amdgpu_sync_create(&sync_obj);
>
>-      /* Validate BOs and map them to GPUVM (update VM page tables).
>*/
>+      /* Validate BOs managed by KFD */
>       list_for_each_entry(mem, &process_info->kfd_bo_list,
>                           validate_list) {
>
>               struct amdgpu_bo *bo = mem->bo;
>               uint32_t domain = mem->domain;
>-              struct kfd_mem_attachment *attachment;
>               struct dma_resv_iter cursor;
>               struct dma_fence *fence;
>
>@@ -2931,6 +2930,25 @@ int
>amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>__rcu *
>                               goto validate_map_fail;
>                       }
>               }
>+      }
>+
>+      if (failed_size)
>+              pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
>+
>+      /* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
>+       * validations above would invalidate DMABuf imports again.
>+       */
>+      ret = process_validate_vms(process_info, &exec.ticket);
>+      if (ret) {
>+              pr_debug("Validating VMs failed, ret: %d\n", ret);
>+              goto validate_map_fail;
>+      }
>+
>+      /* Update mappings managed by KFD. */
>+      list_for_each_entry(mem, &process_info->kfd_bo_list,
>+                          validate_list) {
>+              struct kfd_mem_attachment *attachment;
>+
>               list_for_each_entry(attachment, &mem->attachments, list) {
>                       if (!attachment->is_mapped)
>                               continue;
>@@ -2947,18 +2965,6 @@ int
>amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>__rcu *
>               }
>       }
>
>-      if (failed_size)
>-              pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
>-
>-      /* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
>-       * validations above would invalidate DMABuf imports again.
>-       */
>-      ret = process_validate_vms(process_info, &exec.ticket);
>-      if (ret) {
>-              pr_debug("Validating VMs failed, ret: %d\n", ret);
>-              goto validate_map_fail;
>-      }
>-
>       /* Update mappings not managed by KFD */
>       list_for_each_entry(peer_vm, &process_info->vm_list_head,
>                       vm_list_node) {
>--
>2.25.1



More information about the amd-gfx mailing list