[PATCH 1/3] drm/v3d: Don't resubmit guilty CSD jobs
Yukimasa Sugizaki
ysugi at idein.jp
Thu Sep 3 16:48:19 UTC 2020
From: Yukimasa Sugizaki <ysugi at idein.jp>
The previous code misses a check for the timeout error set by
drm_sched_resubmit_jobs(), which results in an infinite GPU reset loop
if once a timeout occurs:
[ 178.799106] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[ 178.807836] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
[ 179.839132] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[ 179.847865] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
[ 180.879146] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[ 180.887925] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
[ 181.919188] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[ 181.928002] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
...
This commit adds the check for timeout as in v3d_{bin,render}_job_run():
[ 66.408962] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
[ 66.417734] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
[ 66.428296] [drm] Skipping CSD job resubmission due to previous error (-125)
, where -125 is -ECANCELED, though users currently have no way other
than inspecting the dmesg to check if the timeout has occurred.
Signed-off-by: Yukimasa Sugizaki <ysugi at idein.jp>
---
drivers/gpu/drm/v3d/v3d_sched.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 0747614a78f0..001216f22017 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -226,6 +226,17 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
struct dma_fence *fence;
int i;
+ /* This error is set to -ECANCELED by drm_sched_resubmit_jobs() if this
+ * job timed out more than sched_job->sched->hang_limit times.
+ */
+ int error = sched_job->s_fence->finished.error;
+
+ if (unlikely(error < 0)) {
+ DRM_WARN("Skipping CSD job resubmission due to previous error (%d)\n",
+ error);
+ return ERR_PTR(error);
+ }
+
v3d->csd_job = job;
v3d_invalidate_caches(v3d);
--
2.7.4
More information about the dri-devel
mailing list