[PATCH] drm/amdgpu: Unmap CSA under SR-IOV in KFD path

Mon Apr 29 23:34:39 UTC 2019

I remember a past discussion to change the CSA allocation/mapping scheme 
to avoid this issue in the first place. Can adding the CSA to the VM be 
delayed a little to a point after the VM gets converted to a compute VM? 
Maybe the first command submission?

Regards,
   Felix

On 2019-04-28 6:25 a.m., Trigger Huang wrote:
> In amdgpu open path, CSA will be mappened in VM, so when opening
> KFD, calling mdgpu_vm_make_compute  will fail because it found this
> VM is not a clean VM with some mappings, as a result, it will lead
> to failed to create process VM object
>
> The fix is try to unmap CSA, and actually CSA is not needed in
> compute VF world switch
>
> Signed-off-by: Trigger Huang <Trigger.Huang at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 10 ++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c          |  2 +-
>   2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 697b8ef..e0bc457 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -956,6 +956,16 @@ int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev *kgd,
>   	if (avm->process_info)
>   		return -EINVAL;
>   
> +	/* Delete CSA mapping to make sure this VM is a clean VM  before
> +	 *  converting VM
> +	 */
> +	if (amdgpu_sriov_vf(adev) && drv_priv->csa_va) {
> +		amdgpu_bo_reserve(adev->virt.csa_obj, true);
> +		amdgpu_vm_bo_rmv(adev, drv_priv->csa_va);
> +		drv_priv->csa_va = NULL;
> +		amdgpu_bo_unreserve(adev->virt.csa_obj);
> +	}
> +
>   	/* Convert VM into a compute VM */
>   	ret = amdgpu_vm_make_compute(adev, avm, pasid);
>   	if (ret)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index da7b4fe..361c2e5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1069,7 +1069,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>   
>   	amdgpu_vm_bo_rmv(adev, fpriv->prt_va);
>   
> -	if (amdgpu_sriov_vf(adev)) {
> +	if (amdgpu_sriov_vf(adev) && fpriv->csa_va) {
>   		/* TODO: how to handle reserve failure */
>   		BUG_ON(amdgpu_bo_reserve(adev->virt.csa_obj, true));
>   		amdgpu_vm_bo_rmv(adev, fpriv->csa_va);