[PATCH v4 1/7] drm/v3d: Don't run jobs that have errors flagged in its fence

Maíra Canal mcanal at igalia.com
Thu Mar 13 20:20:47 UTC 2025


On 13/03/25 11:43, Maíra Canal wrote:
> The V3D driver still relies on `drm_sched_increase_karma()` and
> `drm_sched_resubmit_jobs()` for resubmissions when a timeout occurs.
> The function `drm_sched_increase_karma()` marks the job as guilty, while
> `drm_sched_resubmit_jobs()` sets an error (-ECANCELED) in the DMA fence of
> that guilty job.
> 
> Because of this, we must check whether the job’s DMA fence has been
> flagged with an error before executing the job. Otherwise, the same guilty
> job may be resubmitted indefinitely, causing repeated GPU resets.
> 
> This patch adds a check for an error on the job's fence to prevent running
> a guilty job that was previously flagged when the GPU timed out.
> 
> Note that the CPU and CACHE_CLEAN queues do not require this check, as
> their jobs are executed synchronously once the DRM scheduler starts them.
> 
> Cc: stable at vger.kernel.org
> Fixes: d223f98f0209 ("drm/v3d: Add support for compute shader dispatch.")
> Fixes: 1584f16ca96e ("drm/v3d: Add support for submitting jobs to the TFU.")
> Reviewed-by: Iago Toral Quiroga <itoral at igalia.com>
> Signed-off-by: Maíra Canal <mcanal at igalia.com>

As patches 1/7 and 2/7 prevent the same faulty job from being
resubmitted in a loop, I just applied them to misc/kernel.git (drm-misc-
fixes).

Best Regards,
- Maíra

> ---
>   drivers/gpu/drm/v3d/v3d_sched.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 80466ce8c7df669280e556c0793490b79e75d2c7..c2010ecdb08f4ba3b54f7783ed33901552d0eba1 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -327,11 +327,15 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
>   	struct drm_device *dev = &v3d->drm;
>   	struct dma_fence *fence;
>   
> +	if (unlikely(job->base.base.s_fence->finished.error))
> +		return NULL;
> +
> +	v3d->tfu_job = job;
> +
>   	fence = v3d_fence_create(v3d, V3D_TFU);
>   	if (IS_ERR(fence))
>   		return NULL;
>   
> -	v3d->tfu_job = job;
>   	if (job->base.irq_fence)
>   		dma_fence_put(job->base.irq_fence);
>   	job->base.irq_fence = dma_fence_get(fence);
> @@ -369,6 +373,9 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
>   	struct dma_fence *fence;
>   	int i, csd_cfg0_reg;
>   
> +	if (unlikely(job->base.base.s_fence->finished.error))
> +		return NULL;
> +
>   	v3d->csd_job = job;
>   
>   	v3d_invalidate_caches(v3d);
> 



More information about the dri-devel mailing list