[PATCH 3/5] drm/sched: Warn if pending list is not empty

Thu Apr 17 17:07:00 UTC 2025

On Thu, Apr 17, 2025 at 05:08:12PM +0100, Tvrtko Ursulin wrote:
> To catch up on why if you could dig out the links to past discussions it
> would be helpful.

I can't look it up currently, sorry. That's why I said Philipp will loop you in
once he's back. :)

> I repeat how there is a lot of attractiveness to reference counting. Already
> mentioned memory leak, s_fence oops, and also not having to clear
> job->entity could be useful for things like tracking per entity submission
> stats (imagine CFS like scheduling, generic scheduling DRM cgroup
> controller). So it would be good for me to hear what pitfalls were
> identified in this space.

<snip>

> > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > index 6b72278c4b72..ae3152beca14 100644
> > > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > @@ -1465,6 +1465,10 @@ void drm_sched_fini(struct drm_gpu_scheduler
> > > > > > > *sched)
> > > > > > >     	sched->ready = false;
> > > > > > >     	kfree(sched->sched_rq);
> > > > > > >     	sched->sched_rq = NULL;
> > > > > > > +
> > > > > > > +	if (!list_empty(&sched->pending_list))
> > > > > > > +		dev_err(sched->dev, "%s: Tearing down scheduler
> > > > > > > while jobs are pending!\n",
> > > > > > > +			__func__);
> > > > > 
> > > > > It isn't fair to add this error since it would out of the blue start firing
> > > > > for everyone expect nouveau, no? Regardless if there is a leak or not.
> > > > 
> > > > I think it is pretty fair to warn when detecting a guaranteed bug, no?
> > > > 
> > > > If drm_sched_fini() is call while jobs are still on the pending_list, they won't
> > > > ever be freed, because all workqueues are stopped.
> > > 
> > > Is it a guaranteed bug for drivers are aware of the drm_sched_fini()
> > > limitation and are cleaning up upon themselves?
> > 
> > How could a driver clean up on itself (unless the driver holds its own list of
> > pending jobs)?
> > 
> > Once a job is in flight (i.e. it's on the pending_list) we must guarantee that
> > free_job() is called by the scheduler, which it can't do if we call
> > drm_sched_fini() before the pending_list is empty.
> > 
> > > In other words if you apply the series up to here would it trigger for
> > > nouveau?
> > 
> > No, because nouveau does something very stupid, i.e. replicate the pending_list.
> 
> Ah okay I see it now, it waits for all jobs to finish before calling
> drm_sched_fini(). For some reason I did not think it was doing that given
> the cover letter starts with how that is a big no-no.
> 
> > > Reportedly it triggers for the mock scheduler which also has no
> > > leak.
> > 
> > That sounds impossible. How do you ensure you do *not* leak memory when you tear
> > down the scheduler while it still has pending jobs? Or in other words, who calls
> > free_job() if not the scheduler itself?
> 
> Well the cover letter says it triggers so it is possible. :)

I mean it should be impossible to have jobs on the pending_list when calling
drm_sched_fini(), but not have memory leaks, unless you let the driver do weird
things, such as peeking into implementation details of the scheduler, etc.

> Mock scheduler also tracks the pending jobs itself, but different from
> nouveau it does not wait for jobs to finish and free worker to process them
> all, but having stopped the "hw" backend it cancels them and calls the
> free_job vfunc directly.

That seems very wrong to me. This is exactly what I mean with the driver peeks
into implementation details.

If the API of a component requires to know about and modify internals of the
component, it's pretty much broken and must be fixed.

> Going back to the topic of this series, if we go with a solution along the
> lines of the proposed, I wonder if it would be doable without mandating that
> drivers keep a list parallel to pending_list. Instead have a vfunc DRM
> scheduler would call to cancel job at a time from *its* pending list. It
> would go nicely together with prepare/run/timedout/free.

That's pretty much what this series does, no? With the new callback the driver
is supposed to kill the corresponding fence context, which signals all
associated fences and hence the pending list will clear subsequently.

> Would it allow getting rid of the new state machinery and just cancelling
> and freeing in one go directly from drm_sched_fini()?

Again, I think that's what it does, unless I misunderstand you.