[PATCH] drm/amdkfd: keep BOs in system memory if restore failed
Felix Kuehling
felix.kuehling at amd.com
Mon Nov 30 22:54:45 UTC 2020
Am 2020-11-30 um 5:48 p.m. schrieb Philip Yang:
> If vram is used up, display allocate vram evict the KFD BOs to system
> memory. KFD schedule restore work to restore BOs back to vram. If
> display BOs are pinned in vram, KFD restore work will keep retry, and
> may never success.
>
> If restore BO back to vram failed, keep the BO in system memory to
> prevent endless retry restore, and GPU mapping will update to system
> memory.
>
> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
> ---
> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 703cd5a7b8f7..e54883ff74d2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -2042,6 +2042,8 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
> int ret = 0, i;
> struct list_head duplicate_save;
> struct amdgpu_sync sync_obj;
> + unsigned long failed_size = 0;
> + unsigned long total_size = 0;
>
> INIT_LIST_HEAD(&duplicate_save);
> INIT_LIST_HEAD(&ctx.list);
> @@ -2098,10 +2100,18 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
> uint32_t domain = mem->domain;
> struct kfd_bo_va_list *bo_va_entry;
>
> + total_size += amdgpu_bo_size(bo);
> +
> ret = amdgpu_amdkfd_bo_validate(bo, domain, false);
> if (ret) {
> - pr_debug("Memory eviction: Validate BOs failed. Try again\n");
> - goto validate_map_fail;
> + pr_debug("Memory eviction: Validate BOs failed\n");
> + failed_size += amdgpu_bo_size(bo);
> + ret = amdgpu_amdkfd_bo_validate(bo,
> + AMDGPU_GEM_DOMAIN_GTT, false);
> + if (ret) {
> + pr_debug("Memory eviction: Try again\n");
> + goto validate_map_fail;
> + }
> }
> ret = amdgpu_sync_fence(&sync_obj, bo->tbo.moving);
> if (ret) {
> @@ -2121,6 +2131,9 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
> }
> }
>
> + if (failed_size)
> + pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
> +
> /* Update page directories */
> ret = process_update_pds(process_info, &sync_obj);
> if (ret) {
More information about the amd-gfx
mailing list