[PATCH 1/8] drm/sched: Allow drivers to skip the reset and keep on running

Tue May 6 11:32:02 UTC 2025

On 03/05/2025 21:59, Maíra Canal wrote:
> When the DRM scheduler times out, it's possible that the GPU isn't hung;
> instead, a job may still be running, and there may be no valid reason to
> reset the hardware. This can occur in two situations:
> 
>    1. The GPU exposes some mechanism that ensures the GPU is still making
>       progress. By checking this mechanism, we can safely skip the reset,
>       rearm the timeout, and allow the job to continue running until
>       completion. This is the case for v3d and Etnaviv.
>    2. TDR has fired before the IRQ that signals the fence. Consequently,
>       the job actually finishes, but it triggers a timeout before signaling
>       the completion fence.
> 
> These two scenarios are problematic because we remove the job from the
> `sched->pending_list` before calling `sched->ops->timedout_job()`. This
> means that when the job finally signals completion (e.g. in the IRQ
> handler), the scheduler won't call `sched->ops->free_job()`. As a result,
> the job and its resources won't be freed, leading to a memory leak.
> 
> To resolve this issue, we create a new `drm_gpu_sched_stat` that allows a
> driver to skip the reset. This new status will indicate that the job
> should be reinserted into the pending list, and the driver will still
> signal its completion.

Since this is de facto what drivers do today I agree it makes sense to 
formalise handling for it in the scheduler itself.

Acked-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>

Some minor comments below.

> Signed-off-by: Maíra Canal <mcanal at igalia.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 14 ++++++++++++++
>   include/drm/gpu_scheduler.h            |  2 ++
>   2 files changed, 16 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 829579c41c6b5d8b2abce5ad373c7017469b7680..68ca827d77e32187a034309f881135dbc639a9b4 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -568,6 +568,17 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   			job->sched->ops->free_job(job);
>   			sched->free_guilty = false;
>   		}
> +
> +		/*
> +		 * If the driver indicated that the GPU is still running and wants to skip
> +		 * the reset, reinsert the job back into the pending list and realarm the

re-arm

> +		 * timeout.
> +		 */
> +		if (status == DRM_GPU_SCHED_STAT_RUNNING) {
> +			spin_lock(&sched->job_list_lock);
> +			list_add(&job->list, &sched->pending_list);
> +			spin_unlock(&sched->job_list_lock);
> +		}
>   	} else {
>   		spin_unlock(&sched->job_list_lock);
>   	}
> @@ -590,6 +601,9 @@ static void drm_sched_job_timedout(struct work_struct *work)
>    * This function is typically used for reset recovery (see the docu of
>    * drm_sched_backend_ops.timedout_job() for details). Do not call it for
>    * scheduler teardown, i.e., before calling drm_sched_fini().
> + *
> + * As it's used for reset recovery, drm_sched_stop() shouldn't be called
> + * if the scheduler skipped the timeout (DRM_SCHED_STAT_RUNNING).

s/scheduler/driver/ ?

>    */
>   void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>   {
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 1a7e377d4cbb4fc12ed93c548b236970217945e8..fe9043b6d43141bee831b5fc16b927202a507d51 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -389,11 +389,13 @@ struct drm_sched_job {
>    * @DRM_GPU_SCHED_STAT_NONE: Reserved. Do not use.
>    * @DRM_GPU_SCHED_STAT_NOMINAL: Operation succeeded.
>    * @DRM_GPU_SCHED_STAT_ENODEV: Error: Device is not available anymore.
> + * @DRM_GPU_SCHED_STAT_RUNNING: GPU is still running, so skip the reset.

s/GPU/job/ ?

>    */
>   enum drm_gpu_sched_stat {
>   	DRM_GPU_SCHED_STAT_NONE,
>   	DRM_GPU_SCHED_STAT_NOMINAL,
>   	DRM_GPU_SCHED_STAT_ENODEV,
> +	DRM_GPU_SCHED_STAT_RUNNING,

I am wondering if we could make it more obvious what is the difference 
between "nominal" and "running" and from whose point of view should 
those statuses be considered.

So far we have "nominal" which means scheduler/hardware is working fine 
but the job may or may have not been cancelled. With "running" we kind 
of split it into two sub-statuses and it would be nice for that to be 
intuitively visible from the naming. But I struggle to suggest an 
elegant name while preserving nominal as is.

Thinking out loud here - perhaps that is pointing towards an alternative 
that instead of a new status, a new helper to re-insert the single job 
(like drm_sched_resubmit_job(sched, job)) would fit better? Although it 
would be more churn.

Regards,

Tvrtko

>   };
>   
>   /**
>