[PATCH V7 5/9] drm/amdgpu: Update amdgpu_job_timedout to check if the ring is guilty
jesse.zhang at amd.com
jesse.zhang at amd.com
Thu Feb 13 05:47:11 UTC 2025
From: "Jesse.zhang at amd.com" <Jesse.zhang at amd.com>
This patch updates the `amdgpu_job_timedout` function to check if
the ring is actually guilty of causing the timeout. If not, it
skips error handling and fence completion.
Suggested-by: Alex Deucher <alexander.deucher at amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 100f04475943..f94c876db72b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -101,6 +101,16 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
/* Effectively the job is aborted as the device is gone */
return DRM_GPU_SCHED_STAT_ENODEV;
}
+ /* Check if the ring is actually guilty of causing the timeout.
+ * If not, skip error handling and fence completion.
+ */
+ if (amdgpu_gpu_recovery && ring->funcs->is_guilty) {
+ if (!ring->funcs->is_guilty(ring)) {
+ dev_err(adev->dev, "ring %s timeout, but not guilty\n",
+ s_job->sched->name);
+ goto exit;
+ }
+ }
/*
* Do the coredump immediately after a job timeout to get a very
--
2.25.1
More information about the amd-gfx
mailing list