[PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset

Liu, Monk Monk.Liu at amd.com
Mon May 8 09:10:30 UTC 2017


For SR-IOV use case, we call gpu reset under the case we have no choice ...

So many places like debug fs shouldn't a good reason to trigger gpu reset

You know that gpu reset under SR-IOV will have very big impact on all other VFs ...

BR Monk

-----Original Message-----
From: Christian König [mailto:deathsimple at vodafone.de] 
Sent: Monday, May 08, 2017 5:08 PM
To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
Subject: Re: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset

Am 08.05.2017 um 08:51 schrieb Monk Liu:
> because we don't want to do sriov-gpu-reset under certain cases, so 
> just split those two funtion and don't invoke sr-iov one from 
> bare-metal one.
>
> Change-Id: I641126c241e2ee2dfd54e6d16c389b159f99cfe0
> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 3 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 3 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 6 +++++-
>   5 files changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 45a60a6..4985a7e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2652,9 +2652,6 @@ int amdgpu_gpu_reset(struct amdgpu_device *adev)
>   	int resched;
>   	bool need_full_reset;
>   
> -	if (amdgpu_sriov_vf(adev))
> -		return amdgpu_sriov_gpu_reset(adev, true);
> -
>   	if (!amdgpu_check_soft_reset(adev)) {
>   		DRM_INFO("No hardware hang detected. Did some blocks stall?\n");
>   		return 0;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 5772ef2..d7523d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -651,7 +651,8 @@ static int amdgpu_debugfs_gpu_reset(struct seq_file *m, void *data)
>   	struct amdgpu_device *adev = dev->dev_private;
>   
>   	seq_printf(m, "gpu reset\n");
> -	amdgpu_gpu_reset(adev);
> +	if (!amdgpu_sriov_vf(adev))
> +		amdgpu_gpu_reset(adev);

Well that is clearly not a good idea. Why do you want to disable the reset here?

>   
>   	return 0;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 67be795..5bcbea0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -221,7 +221,7 @@ void amdgpu_gem_object_close(struct drm_gem_object 
> *obj,
>   
>   static int amdgpu_gem_handle_lockup(struct amdgpu_device *adev, int r)
>   {
> -	if (r == -EDEADLK) {
> +	if (r == -EDEADLK && !amdgpu_sriov_vf(adev)) {

Not a problem of your patch, but that stuff is outdated and should have been removed completely years ago. Going to take care of that.

>   		r = amdgpu_gpu_reset(adev);
>   		if (!r)
>   			r = -EAGAIN;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index f8a6c95..49c6e6e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -89,7 +89,8 @@ static void amdgpu_irq_reset_work_func(struct work_struct *work)
>   	struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
>   						  reset_work);
>   
> -	amdgpu_gpu_reset(adev);
> +	if (!amdgpu_sriov_vf(adev))
> +		amdgpu_gpu_reset(adev);

Mhm, that disables the reset on an invalid register access or invalid command stream. Is that really what we want?

Christian.

>   }
>   
>   /* Disable *all* interrupts */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index 690ef3d..c7718af 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -36,7 +36,11 @@ static void amdgpu_job_timedout(struct amd_sched_job *s_job)
>   		  job->base.sched->name,
>   		  atomic_read(&job->ring->fence_drv.last_seq),
>   		  job->ring->fence_drv.sync_seq);
> -	amdgpu_gpu_reset(job->adev);
> +
> +	if (amdgpu_sriov_vf(job->adev))
> +		amdgpu_sriov_gpu_reset(job->adev, true);
> +	else
> +		amdgpu_gpu_reset(job->adev);
>   }
>   
>   int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,




More information about the amd-gfx mailing list