drm/sched: Replacement for drm_sched_resubmit_jobs() is deprecated

Wed May 3 08:16:24 UTC 2023

Hi Christian,

On Tue, 2 May 2023 14:41:32 +0200
Boris Brezillon <boris.brezillon at collabora.com> wrote:

> Hi Christian,
> 
> Thanks for your quick reply.
> 
> On Tue, 2 May 2023 13:36:07 +0200
> Christian König <christian.koenig at amd.com> wrote:
> 
> > Hi Boris,
> > 
> > Am 02.05.23 um 13:19 schrieb Boris Brezillon:  
> > > Hello Christian, Alex,
> > >
> > > As part of our transition to drm_sched for the powervr GPU driver, we
> > > realized drm_sched_resubmit_jobs(), which is used by all drivers
> > > relying on drm_sched right except amdgpu, has been deprecated.
> > > Unfortunately, commit 5efbe6aa7a0e ("drm/scheduler: deprecate
> > > drm_sched_resubmit_jobs") doesn't describe what drivers should do or use
> > > as an alternative.
> > >
> > > At the very least, for our implementation, we need to restore the
> > > drm_sched_job::parent pointers that were set to NULL in
> > > drm_sched_stop(), such that jobs submitted before the GPU recovery are
> > > considered active when drm_sched_start() is called. That could be done
> > > with a custom pending_list iteration restoring drm_sched_job::parent's
> > > pointer, but that seems odd to let the scheduler backend manipulate
> > > this list directly, and I suspect we need to do other checks, like the
> > > karma vs hang-limit thing, so we can flag the entity dirty and cancel
> > > all jobs being queued there if the entity has caused too many hangs.
> > >
> > > Now that drm_sched_resubmit_jobs() has been deprecated, that would be
> > > great if you could help us write a piece of documentation describing
> > > what should be done between drm_sched_stop() and drm_sched_start(), so
> > > new drivers don't come up with their own slightly different/broken
> > > version of the same thing.    
> > 
> > Yeah, really good point! The solution is to not use drm_sched_stop() and 
> > drm_sched_start() either.  
> 
> Okay. If that's what we're heading to, this should really be clarified
> in the job_timedout() method doc, because right now it's
> mentioning drm_sched_{start,stop,resubmit_jobs}(), with
> drm_sched_resubmit_jobs() being deprecated already.
> 
> > 
> > The general idea Daniel, the other Intel guys and me seem to have agreed 
> > on is to convert the scheduler thread into a work item.
> > 
> > This work item for pushing jobs to the hw can then be queued to the same 
> > workqueue we use for the timeout work item.
> > 
> > If this workqueue is now configured by your driver as single threaded 
> > you have a guarantee that only either the scheduler or the timeout work 
> > item is running at the same time. That in turn makes starting/stopping 
> > the scheduler for a reset completely superfluous.  
> 
> Makes sense.
> 
> > 
> > Patches for this has already been floating on the mailing list, but 
> > haven't been committed yet. Since this is all WIP.  
> 
> Assuming you're talking about [1], yes, I'm aware of this effort
> (PowerVR also has FW-side scheduling, which is what this patch series
> was trying to address initially). And I'm aware of the
> ordered-workqueue trick too, it helped fixing a few races in panfrost
> in the past.
> 
> > 
> > In general it's not really a good idea to change the scheduler and hw 
> > fences during GPU reset/recovery. The dma_fence implementation has a 
> > pretty strict state transition which clearly say that a dma_fence should 
> > never go back from signaled to unsignaled and when you start messing 
> > with that this is exactly what might happen.
> > 
> > What you can do is to save your hw state and re-start at the same 
> > location after handling the timeout.  
> 
> To sum-up, we shouldn't call drm_sched_{start,stop,resubmit_jobs}().

After the discussion I had with Matthew yesterday on IRC, I
realized there was no clear agreement on this. Matthew uses those 3
helpers in the Xe driver right now, and given he intends to use a
multi-threaded wq for its 1:1 schedulers run queue, there's no way he
can get away without calling drm_sched_{start,stop}().
drm_sched_resubmit_jobs() can be open-coded in each driver, but I'm
wondering if it wouldn't be preferable to add a ::resubmit_job() method
or extend the ::run_job() one to support the resubmit semantics, which,
AFAIU, is just about enforcing the job done fence (the one returned by
::run_job()) doesn't transition from a signaled to an unsignaled state.

But probably more important than providing a generic helper, we should
document the resubmit semantics (AKA, what should and/or shouldn't be
done with pending jobs when a recovery happens). Because forbidding
people to use a generic helper function doesn't give any guarantee that
they'll do the right thing when coding their own logic, unless we give
clues about what's considered right/wrong, and the current state of the
doc is pretty unclear in this regard.

Regards,

Boris