[PATCH] drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset

Felix Kuehling felix.kuehling at amd.com
Fri Apr 3 20:23:09 UTC 2020


Please separate the two fixes into separate commits.

I'd like to see a better explanation for the changes in
kgd_hqd_destroy.  The GFX9 version already has a return -EIO in case
it's in a GPU reset. I would agree with porting that to GFX10. But why
do we need to return 0 only in the SRIOV case?

Regards,
  Felix

Am 2020-04-03 um 1:02 a.m. schrieb Jack Zhang:
> kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate
>
> Without this change, sriov tdr code path will never free those allocated
> memories and get memory leak.
>
> v2:add a bugfix for kiq ring test fail
>
> Signed-off-by: Jack Zhang <Jack.Zhang1 at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 3 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c  | 3 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c         | 2 ++
>  3 files changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> index 4ec6d0c..bdc1f5a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> @@ -543,6 +543,9 @@ static int kgd_hqd_destroy(struct kgd_dev *kgd, void *mqd,
>  	uint32_t temp;
>  	struct v10_compute_mqd *m = get_mqd(mqd);
>  
> +	if (amdgpu_sriov_vf(adev) && adev->in_gpu_reset)
> +		return 0;
> +
>  #if 0
>  	unsigned long flags;
>  	int retry;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> index df841c2..c2562d6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> @@ -541,6 +541,9 @@ int kgd_gfx_v9_hqd_destroy(struct kgd_dev *kgd, void *mqd,
>  	uint32_t temp;
>  	struct v9_mqd *m = get_mqd(mqd);
>  
> +	if (amdgpu_sriov_vf(adev) && adev->in_gpu_reset)
> +		return 0;
> +
>  	if (adev->in_gpu_reset)
>  		return -EIO;
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 8faaa17..e3f7441 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3854,6 +3854,8 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
>  	if (r)
>  		return r;
>  
> +	amdgpu_amdkfd_pre_reset(adev);
> +
>  	/* Resume IP prior to SMC */
>  	r = amdgpu_device_ip_reinit_early_sriov(adev);
>  	if (r)


More information about the amd-gfx mailing list