[PATCH 2/2] drm/amdgpu: set job guilty if reset skipped

Andrey Grodzovsky Andrey.Grodzovsky at amd.com
Thu Jan 14 14:48:41 UTC 2021


Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>

Andrey

On 1/14/21 8:37 AM, Horace Chen wrote:
> If 2 jobs on 2 different ring timed out the at a very
> short period, the reset for second job will be skipped
> because the reset is already in progress.
>
> But it doesn't mean the second job is not guilty since it also
> timed out and can be a bad job. So before skipped out from the
> reset, we need to increase karma for this job too.
>
> Signed-off-by: Horace Chen <horace.chen at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index a28e138ac72c..d1112e29c8b4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4572,6 +4572,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   		if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) {
>   			DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress",
>   				job ? job->base.id : -1, hive->hive_id);
> +			if(job)
> +				drm_sched_increase_karma(&job->base);
>   			amdgpu_put_xgmi_hive(hive);
>   			return 0;
>   		}
> @@ -4596,6 +4598,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   			dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress",
>   					job ? job->base.id : -1);
>   			r = 0;
> +			if(job)
> +				drm_sched_increase_karma(&job->base);
>   			goto skip_recovery;
>   		}
>   


More information about the amd-gfx mailing list