[PATCH 1/8] drm/sched: Allow drivers to skip the reset and keep on running

Mon May 12 09:24:48 UTC 2025

On Wed, 2025-05-07 at 13:50 +0100, Tvrtko Ursulin wrote:
> 
> On 07/05/2025 13:33, Maíra Canal wrote:
> > Hi Tvrtko,
> > 
> > Thanks for the review!
> > 
> > On 06/05/25 08:32, Tvrtko Ursulin wrote:
> > > 
> > > On 03/05/2025 21:59, Maíra Canal wrote:
> > > > When the DRM scheduler times out, it's possible that the GPU
> > > > isn't hung;
> > > > instead, a job may still be running, and there may be no valid
> > > > reason to
> > > > reset the hardware. This can occur in two situations:
> > > > 
> > > >    1. The GPU exposes some mechanism that ensures the GPU is
> > > > still 
> > > > making
> > > >       progress. By checking this mechanism, we can safely skip
> > > > the 
> > > > reset,
> > > >       rearm the timeout, and allow the job to continue running
> > > > until
> > > >       completion. This is the case for v3d and Etnaviv.
> > > >    2. TDR has fired before the IRQ that signals the fence.
> > > > Consequently,
> > > >       the job actually finishes, but it triggers a timeout
> > > > before 
> > > > signaling
> > > >       the completion fence.
> > > > 
> > > > These two scenarios are problematic because we remove the job
> > > > from the
> > > > `sched->pending_list` before calling `sched->ops-
> > > > >timedout_job()`. This
> > > > means that when the job finally signals completion (e.g. in the
> > > > IRQ
> > > > handler), the scheduler won't call `sched->ops->free_job()`. As
> > > > a 
> > > > result,
> > > > the job and its resources won't be freed, leading to a memory
> > > > leak.
> > > > 
> > > > To resolve this issue, we create a new `drm_gpu_sched_stat`
> > > > that 
> > > > allows a
> > > > driver to skip the reset. This new status will indicate that
> > > > the job
> > > > should be reinserted into the pending list, and the driver will
> > > > still
> > > > signal its completion.
> > > 
> > 
> > [...]
> > 
> > > > diff --git a/include/drm/gpu_scheduler.h
> > > > b/include/drm/gpu_scheduler.h
> > > > index 
> > > > 1a7e377d4cbb4fc12ed93c548b236970217945e8..fe9043b6d43141bee831b
> > > > 5fc16b927202a507d51 100644
> > > > --- a/include/drm/gpu_scheduler.h
> > > > +++ b/include/drm/gpu_scheduler.h
> > > > @@ -389,11 +389,13 @@ struct drm_sched_job {
> > > >    * @DRM_GPU_SCHED_STAT_NONE: Reserved. Do not use.
> > > >    * @DRM_GPU_SCHED_STAT_NOMINAL: Operation succeeded.
> > > >    * @DRM_GPU_SCHED_STAT_ENODEV: Error: Device is not available
> > > > anymore.
> > > > + * @DRM_GPU_SCHED_STAT_RUNNING: GPU is still running, so skip
> > > > the 
> > > > reset.
> > > 
> > > s/GPU/job/ ?
> > > 
> > > >    */
> > > >   enum drm_gpu_sched_stat {
> > > >       DRM_GPU_SCHED_STAT_NONE,
> > > >       DRM_GPU_SCHED_STAT_NOMINAL,
> > > >       DRM_GPU_SCHED_STAT_ENODEV,
> > > > +    DRM_GPU_SCHED_STAT_RUNNING,
> > > 
> > > I am wondering if we could make it more obvious what is the
> > > difference 
> > > between "nominal" and "running" and from whose point of view
> > > should 
> > > those statuses be considered.
> >  > > So far we have "nominal" which means scheduler/hardware is
> > working 
> > fine
> > > but the job may or may have not been cancelled. With "running" we
> > > kind 
> > > of split it into two sub-statuses and it would be nice for that
> > > to be 
> > > intuitively visible from the naming. But I struggle to suggest an
> > > elegant name while preserving nominal as is.
> > 
> > I was thinking: how about changing DRM_GPU_SCHED_STAT_NOMINAL to
> > DRM_GPU_SCHED_STAT_RESET (the hardware is fine, but we reset it)?
> > 
> > Then, when we skip the reset, we would have
> > DRM_GPU_SCHED_STAT_NOMINAL
> > (which means the hardware is fine and we didn't reset it).
> > 
> > I'm open to other suggestions.
> 
> DRM_GPU_SCHED_STAT_RESET sounds like a good name and seems to paint a
> consistent story between running - reset - enodev.
> 
> > > Thinking out loud here - perhaps that is pointing towards an 
> > > alternative that instead of a new status, a new helper to re-
> > > insert 
> > > the single job (like drm_sched_resubmit_job(sched, job)) would
> > > fit 
> > > better? Although it would be more churn.
> > > 
> > 
> > Although your solution might be more elegant, I'm worried that such
> > a
> > function could be used improperly by new users (e.g. being called
> > in
> > contexts other than `timedout_job()`).
> 
> We could call it drm_sched_untimedout_job(). </humour>
> 
> > I'd prefer to have a new status as it'll be use solely for
> > `timedout_job()` (making it harder for users to use it
> > inappropriately).
> > With the addition of Matthew's feedback (calling
> > `drm_sched_run_free_queue()` after adding the job to the pending
> > list),
> > I think it makes even more sense to keep it inside the timeout
> > function.
> > 
> > I hope others can chime in and give their opinions about your idea.
> 
> Yeah - Philipp - Danilo - what do you prefer? Third enum with or a
> new 
> helper?

I'm also afraid that providing yet another helper for this specific
case opens the door to abuse. We had (and still have) issues with the
familiar drm_sched_resubmit_jobs() function. Christian has been very
clear that this was a bad idea, and I'd rather not walk a road that
looks similar to that one.

I tend to think that the status codes are the appropriate mechanism to
address this. They were, after all, invented to inform the scheduler
about what is going on inside the driver.

That said, currently, ENODEV is basically the only error, and
everything unequal ENODEV (i.e., NOMINAL) is the "OK state".

A timeout occurring and the GPU not hanging is, therefore, also "OK".
Whatever the name will be, the docu for NOMINAL must also be adjusted.

How about calling it "NORMAL" instead of "NOMINAL", since that state
actually describes what is both OK and "the norm", i.e., most commonly
the case?

And I wouldn't call it RUNNING, since the GPU is also running in
NOMINAL state. "NO_HANG" could hint more effectively at the fact that
the GPU is, contrary to the scheduler's believe, not hanging.

(I've been out for a few days and am catching up to a lot of things.
Just had time to get deeper into this series. Apologies if my picture
isn't complete yet)

Thanks,
P.

> 
> Regards,
> 
> Tvrtko
>