[PATCH] drm/amdkfd: fix restore worker race condition

Felix Kuehling felix.kuehling at amd.com
Thu May 21 16:11:03 UTC 2020


Am 2020-05-21 um 10:42 a.m. schrieb Philip Yang:
> In free memory of gpu path, remove bo from validate_list to make sure
> restore worker don't access the BO any more, then unregister bo MMU
> interval notifier. Otherwise, the restore worker will crash in the
> middle of validating BO user pages if MMU interval notifer is gone.
>
> Signed-off-by: Philip Yang <Philip.Yang at amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index da8b31a53291..68e6e1bc8f3a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1302,15 +1302,15 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>  		return -EBUSY;
>  	}
>  
> -	/* No more MMU notifiers */
> -	amdgpu_mn_unregister(mem->bo);
> -
>  	/* Make sure restore workers don't access the BO any more */
>  	bo_list_entry = &mem->validate_list;
>  	mutex_lock(&process_info->lock);
>  	list_del(&bo_list_entry->head);
>  	mutex_unlock(&process_info->lock);
>  
> +	/* No more MMU notifiers */
> +	amdgpu_mn_unregister(mem->bo);
> +
>  	ret = reserve_bo_and_cond_vms(mem, NULL, BO_VM_ALL, &ctx);
>  	if (unlikely(ret))
>  		return ret;


More information about the amd-gfx mailing list