[PATCH 2/2] drm/amdgpu: set job guilty if reset skipped
Andrey Grodzovsky
Andrey.Grodzovsky at amd.com
Thu Jan 14 14:48:41 UTC 2021
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
Andrey
On 1/14/21 8:37 AM, Horace Chen wrote:
> If 2 jobs on 2 different ring timed out the at a very
> short period, the reset for second job will be skipped
> because the reset is already in progress.
>
> But it doesn't mean the second job is not guilty since it also
> timed out and can be a bad job. So before skipped out from the
> reset, we need to increase karma for this job too.
>
> Signed-off-by: Horace Chen <horace.chen at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index a28e138ac72c..d1112e29c8b4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4572,6 +4572,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
> if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) {
> DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress",
> job ? job->base.id : -1, hive->hive_id);
> + if(job)
> + drm_sched_increase_karma(&job->base);
> amdgpu_put_xgmi_hive(hive);
> return 0;
> }
> @@ -4596,6 +4598,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
> dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress",
> job ? job->base.id : -1);
> r = 0;
> + if(job)
> + drm_sched_increase_karma(&job->base);
> goto skip_recovery;
> }
>
More information about the amd-gfx
mailing list