[PATCH v4 1/7] drm/v3d: Don't run jobs that have errors flagged in its fence
Maíra Canal
mcanal at igalia.com
Thu Mar 13 20:20:47 UTC 2025
On 13/03/25 11:43, Maíra Canal wrote:
> The V3D driver still relies on `drm_sched_increase_karma()` and
> `drm_sched_resubmit_jobs()` for resubmissions when a timeout occurs.
> The function `drm_sched_increase_karma()` marks the job as guilty, while
> `drm_sched_resubmit_jobs()` sets an error (-ECANCELED) in the DMA fence of
> that guilty job.
>
> Because of this, we must check whether the job’s DMA fence has been
> flagged with an error before executing the job. Otherwise, the same guilty
> job may be resubmitted indefinitely, causing repeated GPU resets.
>
> This patch adds a check for an error on the job's fence to prevent running
> a guilty job that was previously flagged when the GPU timed out.
>
> Note that the CPU and CACHE_CLEAN queues do not require this check, as
> their jobs are executed synchronously once the DRM scheduler starts them.
>
> Cc: stable at vger.kernel.org
> Fixes: d223f98f0209 ("drm/v3d: Add support for compute shader dispatch.")
> Fixes: 1584f16ca96e ("drm/v3d: Add support for submitting jobs to the TFU.")
> Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>
> Signed-off-by: Maíra Canal <mcanal at igalia.com>
As patches 1/7 and 2/7 prevent the same faulty job from being
resubmitted in a loop, I just applied them to misc/kernel.git (drm-misc-
fixes).
Best Regards,
- Maíra
> ---
> drivers/gpu/drm/v3d/v3d_sched.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 80466ce8c7df669280e556c0793490b79e75d2c7..c2010ecdb08f4ba3b54f7783ed33901552d0eba1 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -327,11 +327,15 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
> struct drm_device *dev = &v3d->drm;
> struct dma_fence *fence;
>
> + if (unlikely(job->base.base.s_fence->finished.error))
> + return NULL;
> +
> + v3d->tfu_job = job;
> +
> fence = v3d_fence_create(v3d, V3D_TFU);
> if (IS_ERR(fence))
> return NULL;
>
> - v3d->tfu_job = job;
> if (job->base.irq_fence)
> dma_fence_put(job->base.irq_fence);
> job->base.irq_fence = dma_fence_get(fence);
> @@ -369,6 +373,9 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
> struct dma_fence *fence;
> int i, csd_cfg0_reg;
>
> + if (unlikely(job->base.base.s_fence->finished.error))
> + return NULL;
> +
> v3d->csd_job = job;
>
> v3d_invalidate_caches(v3d);
>
More information about the dri-devel
mailing list