[PATCH] drm/amdkfd: keep BOs in system memory if restore failed

Felix Kuehling felix.kuehling at amd.com
Mon Nov 30 22:54:45 UTC 2020


Am 2020-11-30 um 5:48 p.m. schrieb Philip Yang:
> If vram is used up, display allocate vram evict the KFD BOs to system
> memory. KFD schedule restore work to restore BOs back to vram. If
> display BOs are pinned in vram, KFD restore work will keep retry, and
> may never success.
>
> If restore BO back to vram failed, keep the BO in system memory to
> prevent endless retry restore, and GPU mapping will update to system
> memory.
>
> Signed-off-by: Philip Yang <Philip.Yang at amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>


> ---
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c    | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 703cd5a7b8f7..e54883ff74d2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -2042,6 +2042,8 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
>  	int ret = 0, i;
>  	struct list_head duplicate_save;
>  	struct amdgpu_sync sync_obj;
> +	unsigned long failed_size = 0;
> +	unsigned long total_size = 0;
>  
>  	INIT_LIST_HEAD(&duplicate_save);
>  	INIT_LIST_HEAD(&ctx.list);
> @@ -2098,10 +2100,18 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
>  		uint32_t domain = mem->domain;
>  		struct kfd_bo_va_list *bo_va_entry;
>  
> +		total_size += amdgpu_bo_size(bo);
> +
>  		ret = amdgpu_amdkfd_bo_validate(bo, domain, false);
>  		if (ret) {
> -			pr_debug("Memory eviction: Validate BOs failed. Try again\n");
> -			goto validate_map_fail;
> +			pr_debug("Memory eviction: Validate BOs failed\n");
> +			failed_size += amdgpu_bo_size(bo);
> +			ret = amdgpu_amdkfd_bo_validate(bo,
> +						AMDGPU_GEM_DOMAIN_GTT, false);
> +			if (ret) {
> +				pr_debug("Memory eviction: Try again\n");
> +				goto validate_map_fail;
> +			}
>  		}
>  		ret = amdgpu_sync_fence(&sync_obj, bo->tbo.moving);
>  		if (ret) {
> @@ -2121,6 +2131,9 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
>  		}
>  	}
>  
> +	if (failed_size)
> +		pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
> +
>  	/* Update page directories */
>  	ret = process_update_pds(process_info, &sync_obj);
>  	if (ret) {


More information about the amd-gfx mailing list