[PATCH v2 2/6] drm/sched: Prevent teardown waitque from blocking too long
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Fri May 16 10:34:04 UTC 2025
On 16/05/2025 10:54, Philipp Stanner wrote:
> On Fri, 2025-05-16 at 10:33 +0100, Tvrtko Ursulin wrote:
>>
>> On 24/04/2025 10:55, Philipp Stanner wrote:
>>> The waitqueue that ensures that drm_sched_fini() blocks until the
>>> pending_list has become empty could theoretically cause that
>>> function to
>>> block for a very long time. That, ultimately, could block userspace
>>> procesess and prevent them from being killable through SIGKILL.
>>>
>>> When a driver calls drm_sched_fini(), it is safe to assume that all
>>> still pending jobs are not needed anymore anyways. Thus, they can
>>> be
>>> cancelled and thereby it can be ensured that drm_sched_fini() will
>>> return relatively quickly.
>>>
>>> Implement a new helper to stop all work items / submission except
>>> for
>>> the drm_sched_backend_ops.run_job().
>>>
>>> Implement a driver callback, kill_fence_context(), that instructs
>>> the
>>> driver to kill the fence context associated with this scheduler,
>>> thereby
>>> causing all pending hardware fences to be signalled.
>>>
>>> Call those new routines in drm_sched_fini() and ensure backwards
>>> compatibility if the new callback is not implemented.
>>>
>>> Suggested-by: Danilo Krummrich <dakr at redhat.com>
>>> Signed-off-by: Philipp Stanner <phasta at kernel.org>
>>> ---
>>> drivers/gpu/drm/scheduler/sched_main.c | 47 +++++++++++++++++----
>>> -----
>>> include/drm/gpu_scheduler.h | 11 ++++++
>>> 2 files changed, 42 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index ea82e69a72a8..c2ad6c70bfb6 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -1390,31 +1390,46 @@ drm_sched_no_jobs_pending(struct
>>> drm_gpu_scheduler *sched)
>>> return empty;
>>> }
>>>
>>> +/**
>>> + * drm_sched_cancel_jobs_and_wait - trigger freeing of all pending
>>> jobs
>>> + * @sched: scheduler instance
>>> + *
>>> + * Must only be called if &struct
>>> drm_sched_backend_ops.kill_fence_context is
>>> + * implemented.
>>> + *
>>> + * Instructs the driver to kill the fence context associated with
>>> this scheduler,
>>> + * thereby signalling all pending fences. This, in turn, will
>>> trigger
>>> + * &struct drm_sched_backend_ops.free_job to be called for all
>>> pending jobs.
>>> + * The function then blocks until all pending jobs have been
>>> freed.
>>> + */
>>> +static inline void
>>> +drm_sched_cancel_jobs_and_wait(struct drm_gpu_scheduler *sched)
>>> +{
>>> + sched->ops->kill_fence_context(sched);
>>> + wait_event(sched->pending_list_waitque,
>>> drm_sched_no_jobs_pending(sched));
>>> +}
>>> +
>>> /**
>>> * drm_sched_fini - Destroy a gpu scheduler
>>> *
>>> * @sched: scheduler instance
>>> *
>>> - * Tears down and cleans up the scheduler.
>>> - *
>>> - * Note that this function blocks until all the fences returned by
>>> - * &struct drm_sched_backend_ops.run_job have been signalled.
>>> + * Tears down and cleans up the scheduler. Might leak memory if
>>> + * &struct drm_sched_backend_ops.kill_fence_context is not
>>> implemented.
>>> */
>>> void drm_sched_fini(struct drm_gpu_scheduler *sched)
>>> {
>>> struct drm_sched_entity *s_entity;
>>> int i;
>>>
>>> - /*
>>> - * Jobs that have neither been scheduled or which have
>>> timed out are
>>> - * gone by now, but jobs that have been submitted through
>>> - * backend_ops.run_job() and have not yet terminated are
>>> still pending.
>>> - *
>>> - * Wait for the pending_list to become empty to avoid
>>> leaking those jobs.
>>> - */
>>> - drm_sched_submission_and_timeout_stop(sched);
>>> - wait_event(sched->pending_list_waitque,
>>> drm_sched_no_jobs_pending(sched));
>>> - drm_sched_free_stop(sched);
>>> + if (sched->ops->kill_fence_context) {
>>> + drm_sched_submission_and_timeout_stop(sched);
>>> + drm_sched_cancel_jobs_and_wait(sched);
>>> + drm_sched_free_stop(sched);
>>> + } else {
>>> + /* We're in "legacy free-mode" and ignore
>>> potential mem leaks */
>>> + drm_sched_wqueue_stop(sched);
>>> + }
>>>
>>> for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs;
>>> i++) {
>>> struct drm_sched_rq *rq = sched->sched_rq[i];
>>> @@ -1502,7 +1517,7 @@ bool drm_sched_wqueue_ready(struct
>>> drm_gpu_scheduler *sched)
>>> EXPORT_SYMBOL(drm_sched_wqueue_ready);
>>>
>>> /**
>>> - * drm_sched_wqueue_stop - stop scheduler submission
>>> + * drm_sched_wqueue_stop - stop scheduler submission and freeing
>>
>> Looks like the kerneldoc corrections (below too) belong to the
>> previous
>> patch. Irrelevant if you decide to squash them though.
>>
>>> * @sched: scheduler instance
>>> *
>>> * Stops the scheduler from pulling new jobs from entities. It
>>> also stops
>>> @@ -1518,7 +1533,7 @@ void drm_sched_wqueue_stop(struct
>>> drm_gpu_scheduler *sched)
>>> EXPORT_SYMBOL(drm_sched_wqueue_stop);
>>>
>>> /**
>>> - * drm_sched_wqueue_start - start scheduler submission
>>> + * drm_sched_wqueue_start - start scheduler submission and freeing
>>> * @sched: scheduler instance
>>> *
>>> * Restarts the scheduler after drm_sched_wqueue_stop() has
>>> stopped it.
>>> diff --git a/include/drm/gpu_scheduler.h
>>> b/include/drm/gpu_scheduler.h
>>> index d0b1f416b4d9..8630b4a26f10 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -509,6 +509,17 @@ struct drm_sched_backend_ops {
>>> * and it's time to clean it up.
>>> */
>>> void (*free_job)(struct drm_sched_job *sched_job);
>>> +
>>> + /**
>>> + * @kill_fence_context: kill the fence context belonging
>>> to this scheduler
>>
>> Which fence context would that be? ;)
>>
>> Also, "fence context" would be a new terminology in gpu_scheduler.h
>> API
>> level. You could call it ->sched_fini() or similar to signify at
>> which
>> point in the API it gets called and then the fact it takes sched as
>> parameter would be natural.
>>
>> We also probably want some commentary on the topic of indefinite (or
>> very long at least) blocking a thread exit / SIGINT/TERM/KILL time.
>> Cover letter touches upon that problem but I don't see you address
>> it.
>> Is the idea to let drivers shoot themselves in the foot or what?
>
> You didn't seriously just write that, did you?
>
> Yes, my intention clearly has been since September to make things
> worse, more ambiguous and destroy drivers. That's why I'm probably the
> only individual after Sima (who added some of the FIXMEs) who ever
> bothered documenting all this mess here, and warning people about the
> five hundred pitfalls, like three different start and stop functions,
> implicit refcount rules and God knows which other insane hacks that
> simply move driver problems into common infrastructure.
>
> Reconsider your attitude.
I don't know what kind of attitude you think I was expressing. If the
last sentence was a hyperbole too much for you then sorry. Point was in
the paragraph ending with that - to document callback must not block, or
if it has to to discuss for how long. And to perhaps discuss if
scheduler code should impose a limit. Otherwise the indefinite block on
thread exit from the cover letter can still happen. I don't see raising
those point is a criticism of your overall work. (Fact that we don't see
eye to eye regarding more state machine vs the ->cancel_job()
alternative aside.)
Regards,
Tvrtko
>
> P.
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>> + *
>>> + * Needed to cleanly tear the scheduler down in
>>> drm_sched_fini(). This
>>> + * callback will cause all hardware fences to be signalled
>>> by the driver,
>>> + * which, ultimately, ensures that all jobs get freed
>>> before teardown.
>>> + *
>>> + * This callback is optional, but it is highly recommended
>>> to implement it.
>>> + */
>>> + void (*kill_fence_context)(struct drm_gpu_scheduler
>>> *sched);
>>> };
>>>
>>> /**
>>
>
More information about the dri-devel
mailing list