[PATCH 1/3] drm/v3d: Don't resubmit guilty CSD jobs
Chema Casanova
jmcasanova at igalia.com
Thu Feb 4 13:54:11 UTC 2021
I've tested the patch and confirmed that applies correctly over drm-next.
I've also confirmed that the timeout happens with the described test
case by the developer.
https://github.com/raspberrypi/linux/pull/3816#issuecomment-682251862
Considering this is my first review of a patch in v3d kernel side I
think this patch is fine.
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova at igalia.com>
On 3/9/20 18:48, Yukimasa Sugizaki wrote:
> From: Yukimasa Sugizaki <ysugi at idein.jp>
>
> The previous code misses a check for the timeout error set by
> drm_sched_resubmit_jobs(), which results in an infinite GPU reset loop
> if once a timeout occurs:
>
> [ 178.799106] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
> [ 178.807836] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
> [ 179.839132] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
> [ 179.847865] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
> [ 180.879146] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
> [ 180.887925] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
> [ 181.919188] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
> [ 181.928002] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
> ...
>
> This commit adds the check for timeout as in v3d_{bin,render}_job_run():
>
> [ 66.408962] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* Resetting GPU for hang.
> [ 66.417734] v3d fec00000.v3d: [drm:v3d_reset [v3d]] *ERROR* V3D_ERR_STAT: 0x00001000
> [ 66.428296] [drm] Skipping CSD job resubmission due to previous error (-125)
>
> , where -125 is -ECANCELED, though users currently have no way other
> than inspecting the dmesg to check if the timeout has occurred.
>
> Signed-off-by: Yukimasa Sugizaki <ysugi at idein.jp>
> ---
> drivers/gpu/drm/v3d/v3d_sched.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 0747614a78f0..001216f22017 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -226,6 +226,17 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
> struct dma_fence *fence;
> int i;
>
> + /* This error is set to -ECANCELED by drm_sched_resubmit_jobs() if this
> + * job timed out more than sched_job->sched->hang_limit times.
> + */
> + int error = sched_job->s_fence->finished.error;
> +
> + if (unlikely(error < 0)) {
> + DRM_WARN("Skipping CSD job resubmission due to previous error (%d)\n",
> + error);
> + return ERR_PTR(error);
> + }
> +
> v3d->csd_job = job;
>
> v3d_invalidate_caches(v3d);
> --
> 2.7.4
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
More information about the dri-devel
mailing list