[PATCH] drm/amdkfd: add pinned BOs to kfd_bo_list

Felix Kuehling felix.kuehling at amd.com
Tue May 31 15:22:15 UTC 2022


Am 2022-05-31 um 04:34 schrieb Lang Yu:
> The kfd_bo_list is used to restore process BOs after
> evictions. As page tables could be destroyed during
> evictions, we should also update pinned BOs' page tables
> during restoring to make sure they are valid.
>
> So for pinned BOs,
> 1, Don't validate them, but update their page tables.
> 2, Don't add eviction fence for them.
>
> Signed-off-by: Lang Yu <Lang.Yu at amd.com>
> ---
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 39 ++++++++++---------
>   1 file changed, 20 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 34ba9e776521..67c4bf1944d2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1953,9 +1953,6 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct amdgpu_device *adev,
>   		return -EINVAL;
>   	}
>   
> -	/* delete kgd_mem from kfd_bo_list to avoid re-validating
> -	 * this BO in BO's restoring after eviction.
> -	 */
>   	mutex_lock(&mem->process_info->lock);
>   
>   	ret = amdgpu_bo_reserve(bo, true);
> @@ -1978,7 +1975,6 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct amdgpu_device *adev,
>   
>   	amdgpu_amdkfd_remove_eviction_fence(
>   		bo, mem->process_info->eviction_fence);
> -	list_del_init(&mem->validate_list.head);
>   
>   	if (size)
>   		*size = amdgpu_bo_size(bo);
> @@ -2481,24 +2477,26 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
>   		uint32_t domain = mem->domain;
>   		struct kfd_mem_attachment *attachment;
>   
> -		total_size += amdgpu_bo_size(bo);
> +		if (!bo->tbo.pin_count) {

I think this special case is not necessary. Validating pinned BOs that 
are already valid should have very low overhead. So adding a special 
case to avoid that is not really worth it.

Other than that, this patch looks good to me.

Regards,
   Felix


> +			total_size += amdgpu_bo_size(bo);
>   
> -		ret = amdgpu_amdkfd_bo_validate(bo, domain, false);
> -		if (ret) {
> -			pr_debug("Memory eviction: Validate BOs failed\n");
> -			failed_size += amdgpu_bo_size(bo);
> -			ret = amdgpu_amdkfd_bo_validate(bo,
> -						AMDGPU_GEM_DOMAIN_GTT, false);
> +			ret = amdgpu_amdkfd_bo_validate(bo, domain, false);
> +			if (ret) {
> +				pr_debug("Memory eviction: Validate BOs failed\n");
> +				failed_size += amdgpu_bo_size(bo);
> +				ret = amdgpu_amdkfd_bo_validate(bo, AMDGPU_GEM_DOMAIN_GTT, false);
> +				if (ret) {
> +					pr_debug("Memory eviction: Try again\n");
> +					goto validate_map_fail;
> +				}
> +			}
> +			ret = amdgpu_sync_fence(&sync_obj, bo->tbo.moving);
>   			if (ret) {
> -				pr_debug("Memory eviction: Try again\n");
> +				pr_debug("Memory eviction: Sync BO fence failed. Try again\n");
>   				goto validate_map_fail;
>   			}
>   		}
> -		ret = amdgpu_sync_fence(&sync_obj, bo->tbo.moving);
> -		if (ret) {
> -			pr_debug("Memory eviction: Sync BO fence failed. Try again\n");
> -			goto validate_map_fail;
> -		}
> +
>   		list_for_each_entry(attachment, &mem->attachments, list) {
>   			if (!attachment->is_mapped)
>   				continue;
> @@ -2544,10 +2542,13 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
>   
>   	/* Attach new eviction fence to all BOs */
>   	list_for_each_entry(mem, &process_info->kfd_bo_list,
> -		validate_list.head)
> +		validate_list.head) {
> +		if (mem->bo->tbo.pin_count)
> +			continue;
> +
>   		amdgpu_bo_fence(mem->bo,
>   			&process_info->eviction_fence->base, true);
> -
> +	}
>   	/* Attach eviction fence to PD / PT BOs */
>   	list_for_each_entry(peer_vm, &process_info->vm_list_head,
>   			    vm_list_node) {


More information about the amd-gfx mailing list