[PATCH 2/2] drm/amdgpu: set job guilty if reset skipped
Andrey Grodzovsky
Andrey.Grodzovsky at amd.com
Tue Jan 19 14:55:54 UTC 2021
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
Andrey
On 1/19/21 7:22 AM, Horace Chen wrote:
> If 2 jobs on 2 different ring timed out the at a very short
> period, the reset for second job will be skipped because the
> reset is already in progress.
>
> But it doesn't mean the second job is not guilty since it
> also timed out and can be a bad job. So before skipped out
> from the reset, we need to increase karma for this job too.
>
> Signed-off-by: Horace Chen <horace.chen at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 9574da3abc32..1d6ff9fe37de 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4574,6 +4574,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
> DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress",
> job ? job->base.id : -1, hive->hive_id);
> amdgpu_put_xgmi_hive(hive);
> + if (job)
> + drm_sched_increase_karma(&job->base);
> return 0;
> }
> mutex_lock(&hive->hive_lock);
> @@ -4617,6 +4619,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
> job ? job->base.id : -1);
> r = 0;
> /* even we skipped this reset, still need to set the job to guilty */
> + if (job)
> + drm_sched_increase_karma(&job->base);
> goto skip_recovery;
> }
>
More information about the amd-gfx
mailing list