[RFC PATCH] drm/amdgpu: Remove eviction fence before release bo

Christian König christian.koenig at amd.com
Wed Feb 5 14:02:15 UTC 2020


Am 05.02.20 um 13:56 schrieb Pan, Xinhui:
> No need to trigger eviction as the memory mapping will not be used anymore.
>
> All pt/pd bos share same resv, hence the same shared eviction fence. Everytime page table is freed, the fence will be signled and that cuases kfd unexcepted evictions.
>
> kfd bo uses its own resv, so it is not affetced.
>
> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
> ---
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 47b0f29..265b1ed 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -96,6 +96,7 @@
>   						       struct mm_struct *mm);
>   bool amdkfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
>   struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
> +int amdgpu_amdkfd_remove_fence_on_pt_pd_bos(struct amdgpu_bo *bo);
>   
>   struct amdkfd_process_info {
>   	/* List head of all VMs that belong to a KFD process */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index ef721cb..a3c55ad 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -276,6 +276,26 @@
>   	return 0;
>   }
>   
> +int amdgpu_amdkfd_remove_fence_on_pt_pd_bos(struct amdgpu_bo *bo)
> +{
> +	struct amdgpu_vm *vm;
> +	int ret = 0;
> +
> +	if (bo->vm_bo && bo->vm_bo->vm) {
> +		vm = bo->vm_bo->vm;
> +		if (vm->process_info && vm->process_info->eviction_fence) {

Better write that as checking of prerequisites, e.g. if (!...) return;

> +			BUG_ON(!dma_resv_trylock(&bo->tbo.base._resv));
> +			if (bo->tbo.base.resv != &bo->tbo.base._resv) {
> +				dma_resv_copy_fences(&bo->tbo.base._resv, bo->tbo.base.resv);
> +				bo->tbo.base.resv = &bo->tbo.base._resv;

That doesn't work correctly and could crash really really badly. We need 
to rework how deleted BOs are handled in TTM first for this.

Roughly a month or two ago I send out a patch set which does that, but I 
never got around to finish it up.

Regards,
Christian.

> +			}
> +			ret = amdgpu_amdkfd_remove_eviction_fence(bo, vm->process_info->eviction_fence);
> +			dma_resv_unlock(bo->tbo.base.resv);
> +		}
> +	}
> +	return ret;
> +}
> +
>   static int amdgpu_amdkfd_bo_validate(struct amdgpu_bo *bo, uint32_t domain,
>   				     bool wait)
>   {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 6f60a58..4b5bee0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -1307,6 +1307,9 @@
>   	if (abo->kfd_bo)
>   		amdgpu_amdkfd_unreserve_memory_limit(abo);
>   
> +	amdgpu_amdkfd_remove_fence_on_pt_pd_bos(abo);
> +	abo->vm_bo = NULL;
> +
>   	if (bo->mem.mem_type != TTM_PL_VRAM || !bo->mem.mm_node ||
>   	    !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
>   		return;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index cc56eab..187cdb3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -945,7 +945,6 @@
>   static void amdgpu_vm_free_table(struct amdgpu_vm_pt *entry)
>   {
>   	if (entry->base.bo) {
> -		entry->base.bo->vm_bo = NULL;
>   		list_del(&entry->base.vm_status);
>   		amdgpu_bo_unref(&entry->base.bo->shadow);
>   		amdgpu_bo_unref(&entry->base.bo);



More information about the amd-gfx mailing list