[PATCH v2] drm/amdkfd: make sure VM is ready for updating operations

Christian König ckoenig.leichtzumerken at gmail.com
Tue Apr 16 09:07:46 UTC 2024


Looks valid to me of hand, but it's really Felix who needs to judge this.

On the other hand if it blocks any CI feel free to add my acked-by and 
submit it.

Christian.

Am 16.04.24 um 04:05 schrieb Yu, Lang:
> [Public]
>
> ping
>
>> -----Original Message-----
>> From: Yu, Lang <Lang.Yu at amd.com>
>> Sent: Thursday, April 11, 2024 4:11 PM
>> To: amd-gfx at lists.freedesktop.org
>> Cc: Koenig, Christian <Christian.Koenig at amd.com>; Kuehling, Felix
>> <Felix.Kuehling at amd.com>; Yu, Lang <Lang.Yu at amd.com>
>> Subject: [PATCH v2] drm/amdkfd: make sure VM is ready for updating
>> operations
>>
>> When page table BOs were evicted but not validated before updating page
>> tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY
>> and restore_process_worker runs into a dead loop.
>>
>> v2: Split the BO validation and page table update into two separate loops in
>> amdgpu_amdkfd_restore_process_bos. (Felix)
>>   1.Validate BOs
>>   2.Validate VM (and DMABuf attachments)
>>   3.Update page tables for the BOs validated above
>>
>> Fixes: 2fdba514ad5a ("drm/amdgpu: Auto-validate DMABuf imports in
>> compute VMs")
>>
>> Signed-off-by: Lang Yu <Lang.Yu at amd.com>
>> ---
>> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 34 +++++++++++----
>> ----
>> 1 file changed, 20 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index 0ae9fd844623..e2c9e6ddb1d1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -2900,13 +2900,12 @@ int
>> amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>> __rcu *
>>
>>        amdgpu_sync_create(&sync_obj);
>>
>> -      /* Validate BOs and map them to GPUVM (update VM page tables).
>> */
>> +      /* Validate BOs managed by KFD */
>>        list_for_each_entry(mem, &process_info->kfd_bo_list,
>>                            validate_list) {
>>
>>                struct amdgpu_bo *bo = mem->bo;
>>                uint32_t domain = mem->domain;
>> -              struct kfd_mem_attachment *attachment;
>>                struct dma_resv_iter cursor;
>>                struct dma_fence *fence;
>>
>> @@ -2931,6 +2930,25 @@ int
>> amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>> __rcu *
>>                                goto validate_map_fail;
>>                        }
>>                }
>> +      }
>> +
>> +      if (failed_size)
>> +              pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
>> +
>> +      /* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
>> +       * validations above would invalidate DMABuf imports again.
>> +       */
>> +      ret = process_validate_vms(process_info, &exec.ticket);
>> +      if (ret) {
>> +              pr_debug("Validating VMs failed, ret: %d\n", ret);
>> +              goto validate_map_fail;
>> +      }
>> +
>> +      /* Update mappings managed by KFD. */
>> +      list_for_each_entry(mem, &process_info->kfd_bo_list,
>> +                          validate_list) {
>> +              struct kfd_mem_attachment *attachment;
>> +
>>                list_for_each_entry(attachment, &mem->attachments, list) {
>>                        if (!attachment->is_mapped)
>>                                continue;
>> @@ -2947,18 +2965,6 @@ int
>> amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
>> __rcu *
>>                }
>>        }
>>
>> -      if (failed_size)
>> -              pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
>> -
>> -      /* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
>> -       * validations above would invalidate DMABuf imports again.
>> -       */
>> -      ret = process_validate_vms(process_info, &exec.ticket);
>> -      if (ret) {
>> -              pr_debug("Validating VMs failed, ret: %d\n", ret);
>> -              goto validate_map_fail;
>> -      }
>> -
>>        /* Update mappings not managed by KFD */
>>        list_for_each_entry(peer_vm, &process_info->vm_list_head,
>>                        vm_list_node) {
>> --
>> 2.25.1



More information about the amd-gfx mailing list