[PATCH] drm/amdkfd: Fix an issue at userptr buffer validation process.

Felix Kuehling felix.kuehling at amd.com
Fri Apr 21 15:40:16 UTC 2023


On 2023-04-20 18:25, Xiaogang.Chen wrote:
> From: Xiaogang Chen <xiaogang.chen at amd.com>
>
> amdgpu_ttm_tt_get_user_pages can fail(-EFAULT). If it failed mem has no associated
> hmm range or user_pages associated. Keep it at process_info->userptr_inval_list and
> mark mem->invalid until following scheduled attempts can valid it.

Thank you! This patch looks good to me. One more nit-pick inline.


>
> Signed-off-by: Xiaogang Chen <Xiaogang.Chen at amd.com>
> ---
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 28 ++++++++++++++-----
>   1 file changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 7b1f5933ebaa..fad5183baf80 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -2444,7 +2444,9 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
>   			ret = -EAGAIN;
>   			goto unlock_out;
>   		}
> -		mem->invalid = 0;
> +		 /* set mem valid if mem has hmm range associated */
> +		if (mem->range)
> +			mem->invalid = 0;
>   	}
>   
>   unlock_out:
> @@ -2576,16 +2578,28 @@ static int confirm_valid_user_pages_locked(struct amdkfd_process_info *process_i
>   	list_for_each_entry_safe(mem, tmp_mem,
>   				 &process_info->userptr_inval_list,
>   				 validate_list.head) {
> -		bool valid = amdgpu_ttm_tt_get_user_pages_done(
> -				mem->bo->tbo.ttm, mem->range);
> +		/* Only check mem with hmm range associated */
> +		bool valid;
>   
> -		mem->range = NULL;
> -		if (!valid) {
> -			WARN(!mem->invalid, "Invalid BO not marked invalid");
> +		if (mem->range) {

You can avoid unnecessary nesting and make this more readable by 
inverting the condition:

		if (!mem->range)
			continue;

		valid = amdgpu_ttm_tt_get_user_pages_done(
				mem->bo->tbo.ttm, mem->range);
		...

With that fixed, the patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>


> +			valid = amdgpu_ttm_tt_get_user_pages_done(
> +					mem->bo->tbo.ttm, mem->range);
> +
> +			mem->range = NULL;
> +			if (!valid) {
> +				WARN(!mem->invalid, "Invalid BO not marked invalid");
> +				ret = -EAGAIN;
> +				continue;
> +			}
> +		} else
> +			/* keep mem without hmm range at userptr_inval_list */
> +			continue;
> +
> +		if (mem->invalid) {
> +			WARN(1, "Valid BO is marked invalid");
>   			ret = -EAGAIN;
>   			continue;
>   		}
> -		WARN(mem->invalid, "Valid BO is marked invalid");
>   
>   		list_move_tail(&mem->validate_list.head,
>   			       &process_info->userptr_valid_list);


More information about the amd-gfx mailing list