[PATCH] drm/amdgpu: fix potential VM faults

Thu Sep 19 14:29:44 UTC 2019

I'm not disagreeing with the change. Just trying to understand how this 
could have caused a VM fault. If the page tables are reserved or fenced 
while you allocate a new one, they would not be evicted. If they are not 
reserved or fenced, there should be no expectation that they stay resident.

Is this related to recoverable page fault handling? Do we need some more 
generic way to handle eviction of page tables and update the parent page 
directory (invalidate the corresponding PDE)?

Regards,
   Felix

On 2019-09-19 4:41, Christian König wrote:
> When we allocate new page tables under memory
> pressure we should not evict old ones.
>
> Signed-off-by: Christian König <christian.koenig at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 70d45d48907a..8e44ecaada35 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -514,7 +514,8 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
>   		.interruptible = (bp->type != ttm_bo_type_kernel),
>   		.no_wait_gpu = bp->no_wait_gpu,
>   		.resv = bp->resv,
> -		.flags = TTM_OPT_FLAG_ALLOW_RES_EVICT
> +		.flags = bp->type != ttm_bo_type_kernel ?
> +			TTM_OPT_FLAG_ALLOW_RES_EVICT : 0
>   	};
>   	struct amdgpu_bo *bo;
>   	unsigned long page_align, size = bp->size;