[RFC] drm/sched: Allow drivers to register for pending list cleanup
Philipp Stanner
phasta at mailbox.org
Tue Apr 22 05:43:54 UTC 2025
On Fri, 2025-04-18 at 12:32 +0100, Tvrtko Ursulin wrote:
> Quick sketch idea for an alternative to
> https://lore.kernel.org/dri-devel/20250407152239.34429-2-phasta@kernel.org/
> .
>
> It is possible it achieves the same effect but with less code and not
> further growing the internal state machine. The callback it adds is
> also
> aligned in spirit (prototype) with other drm_sched_backend_ops
> callbacks.
>
> The new callback is ->cancel_job(job) which is called after
> drm_sched_fini() stops the workqueue for all jobs in the
> pending_list.
> After which, instead of waiting on the free worker to free jobs one
> by
> one, all are freed directly.
>
> As a demonstration we can remove the driver specific cleanup code
> from the
> mock scheduler. (End result tested for no memory leaks or crashes.)
>
> Please check if I am missed some gotcha etc. And if nouveau can by
> made to
> use it.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> Cc: Christian König <ckoenig.leichtzumerken at gmail.com>
> Cc: Danilo Krummrich <dakr at kernel.org>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Philipp Stanner <phasta at kernel.org>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 33 ++++----
> .../gpu/drm/scheduler/tests/mock_scheduler.c | 76 ++++++++---------
> --
> drivers/gpu/drm/scheduler/tests/sched_tests.h | 6 +-
> drivers/gpu/drm/scheduler/tests/tests_basic.c | 1 +
> include/drm/gpu_scheduler.h | 20 +++++
> 5 files changed, 77 insertions(+), 59 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 829579c41c6b..6be11befef68 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1349,25 +1349,23 @@ int drm_sched_init(struct drm_gpu_scheduler
> *sched, const struct drm_sched_init_
> EXPORT_SYMBOL(drm_sched_init);
>
> /**
> - * drm_sched_fini - Destroy a gpu scheduler
> + * drm_sched_fini - Tear down and clean up the scheduler
> *
> * @sched: scheduler instance
> *
> - * Tears down and cleans up the scheduler.
> + * In the process of tear down and cleanup this stops submission of
> new jobs to
> + * the hardware through drm_sched_backend_ops.run_job(), as well as
> freeing of
> + * completed jobs via drm_sched_backend_ops.free_job().
> *
> - * This stops submission of new jobs to the hardware through
> - * drm_sched_backend_ops.run_job(). Consequently,
> drm_sched_backend_ops.free_job()
> - * will not be called for all jobs still in
> drm_gpu_scheduler.pending_list.
> - * There is no solution for this currently. Thus, it is up to the
> driver to make
> - * sure that:
> + * If the driver does not implement
> drm_sched_backend_ops.cleanup_job(), which
> + * is recommended, drm_sched_backend_ops.free_job() will not be
> called for all
> + * jobs still in drm_gpu_scheduler.pending_list. In this case it is
> up to the
> + * driver to make sure that:
> *
> - * a) drm_sched_fini() is only called after for all submitted jobs
> - * drm_sched_backend_ops.free_job() has been called or that
> - * b) the jobs for which drm_sched_backend_ops.free_job() has not
> been called
> + * a) Drm_sched_fini() is only called after for all submitted jobs
> + * drm_sched_backend_ops.free_job() has been called or that;
> + * b) The jobs for which drm_sched_backend_ops.free_job() has not
> been called
> * after drm_sched_fini() ran are freed manually.
> - *
> - * FIXME: Take care of the above problem and prevent this function
> from leaking
> - * the jobs in drm_gpu_scheduler.pending_list under any
> circumstances.
> */
> void drm_sched_fini(struct drm_gpu_scheduler *sched)
> {
> @@ -1397,6 +1395,15 @@ void drm_sched_fini(struct drm_gpu_scheduler
> *sched)
> /* Confirm no work left behind accessing device structures
> */
> cancel_delayed_work_sync(&sched->work_tdr);
>
> + if (sched->ops->cancel_job) {
> + struct drm_sched_job *job;
> +
> + list_for_each_entry_reverse(job, &sched-
> >pending_list, list) {
> + sched->ops->cancel_job(job);
> + sched->ops->free_job(job);
> + }
> + }
> +
> if (sched->own_submit_wq)
> destroy_workqueue(sched->submit_wq);
> sched->ready = false;
> diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> index f999c8859cf7..3c6be5ca26db 100644
> --- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> @@ -63,7 +63,7 @@ static void drm_mock_sched_job_complete(struct
> drm_mock_sched_job *job)
> lockdep_assert_held(&sched->lock);
>
> job->flags |= DRM_MOCK_SCHED_JOB_DONE;
> - list_move_tail(&job->link, &sched->done_list);
> + list_del(&job->link);
> dma_fence_signal(&job->hw_fence);
> complete(&job->done);
> }
> @@ -200,29 +200,49 @@ static struct dma_fence
> *mock_sched_run_job(struct drm_sched_job *sched_job)
> return &job->hw_fence;
> }
>
> +static void mock_sched_cancel_job(struct drm_sched_job *sched_job)
> +{
> + struct drm_mock_scheduler *sched =
> + drm_sched_to_mock_sched(sched_job->sched);
> + struct drm_mock_sched_job *job =
> drm_sched_job_to_mock_job(sched_job);
> +
> + hrtimer_cancel(&job->timer);
> +
> + spin_lock_irq(&sched->lock);
> + spin_lock(&job->lock);
> + if (!dma_fence_is_signaled_locked(&job->hw_fence)) {
> + job->flags |= DRM_MOCK_SCHED_JOB_CANCELED;
> + list_del(&job->link);
> + dma_fence_set_error(&job->hw_fence, -ECANCELED);
> + dma_fence_signal_locked(&job->hw_fence);
> + complete(&job->done);
> + }
> + spin_unlock(&job->lock);
> + spin_unlock_irq(&sched->lock);
> +}
> +
> static enum drm_gpu_sched_stat
> mock_sched_timedout_job(struct drm_sched_job *sched_job)
> {
> + struct drm_mock_scheduler *sched =
> + drm_sched_to_mock_sched(sched_job->sched);
> struct drm_mock_sched_job *job =
> drm_sched_job_to_mock_job(sched_job);
>
> + spin_lock_irq(&sched->lock);
> job->flags |= DRM_MOCK_SCHED_JOB_TIMEDOUT;
> + list_del(&job->link);
> + dma_fence_set_error(&job->hw_fence, -ETIMEDOUT);
> + dma_fence_signal(&job->hw_fence);
> + spin_unlock_irq(&sched->lock);
>
> return DRM_GPU_SCHED_STAT_NOMINAL;
> }
>
> -static void mock_sched_free_job(struct drm_sched_job *sched_job)
> +void drm_mock_sched_job_free(struct drm_sched_job *sched_job)
> {
> - struct drm_mock_scheduler *sched =
> - drm_sched_to_mock_sched(sched_job->sched);
> struct drm_mock_sched_job *job =
> drm_sched_job_to_mock_job(sched_job);
> - unsigned long flags;
>
> - /* Remove from the scheduler done list. */
> - spin_lock_irqsave(&sched->lock, flags);
> - list_del(&job->link);
> - spin_unlock_irqrestore(&sched->lock, flags);
> dma_fence_put(&job->hw_fence);
> -
> drm_sched_job_cleanup(sched_job);
>
> /* Mock job itself is freed by the kunit framework. */
> @@ -230,8 +250,9 @@ static void mock_sched_free_job(struct
> drm_sched_job *sched_job)
>
> static const struct drm_sched_backend_ops drm_mock_scheduler_ops = {
> .run_job = mock_sched_run_job,
> + .cancel_job = mock_sched_cancel_job,
> .timedout_job = mock_sched_timedout_job,
> - .free_job = mock_sched_free_job
> + .free_job = drm_mock_sched_job_free
> };
>
> /**
> @@ -265,7 +286,6 @@ struct drm_mock_scheduler
> *drm_mock_sched_new(struct kunit *test, long timeout)
> sched->hw_timeline.context = dma_fence_context_alloc(1);
> atomic_set(&sched->hw_timeline.next_seqno, 0);
> INIT_LIST_HEAD(&sched->job_list);
> - INIT_LIST_HEAD(&sched->done_list);
> spin_lock_init(&sched->lock);
>
> return sched;
> @@ -280,38 +300,6 @@ struct drm_mock_scheduler
> *drm_mock_sched_new(struct kunit *test, long timeout)
> */
> void drm_mock_sched_fini(struct drm_mock_scheduler *sched)
> {
> - struct drm_mock_sched_job *job, *next;
> - unsigned long flags;
> - LIST_HEAD(list);
> -
> - drm_sched_wqueue_stop(&sched->base);
> -
> - /* Force complete all unfinished jobs. */
> - spin_lock_irqsave(&sched->lock, flags);
> - list_for_each_entry_safe(job, next, &sched->job_list, link)
> - list_move_tail(&job->link, &list);
> - spin_unlock_irqrestore(&sched->lock, flags);
> -
> - list_for_each_entry(job, &list, link)
> - hrtimer_cancel(&job->timer);
> -
> - spin_lock_irqsave(&sched->lock, flags);
> - list_for_each_entry_safe(job, next, &list, link)
> - drm_mock_sched_job_complete(job);
> - spin_unlock_irqrestore(&sched->lock, flags);
> -
> - /*
> - * Free completed jobs and jobs not yet processed by the DRM
> scheduler
> - * free worker.
> - */
> - spin_lock_irqsave(&sched->lock, flags);
> - list_for_each_entry_safe(job, next, &sched->done_list, link)
> - list_move_tail(&job->link, &list);
> - spin_unlock_irqrestore(&sched->lock, flags);
> -
> - list_for_each_entry_safe(job, next, &list, link)
> - mock_sched_free_job(&job->base);
> -
> drm_sched_fini(&sched->base);
> }
>
> diff --git a/drivers/gpu/drm/scheduler/tests/sched_tests.h
> b/drivers/gpu/drm/scheduler/tests/sched_tests.h
> index 27caf8285fb7..7b4e09ad6858 100644
> --- a/drivers/gpu/drm/scheduler/tests/sched_tests.h
> +++ b/drivers/gpu/drm/scheduler/tests/sched_tests.h
> @@ -49,7 +49,6 @@ struct drm_mock_scheduler {
>
> spinlock_t lock;
> struct list_head job_list;
> - struct list_head done_list;
>
> struct {
> u64 context;
> @@ -97,7 +96,8 @@ struct drm_mock_sched_job {
> struct completion done;
>
> #define DRM_MOCK_SCHED_JOB_DONE 0x1
> -#define DRM_MOCK_SCHED_JOB_TIMEDOUT 0x2
> +#define DRM_MOCK_SCHED_JOB_CANCELED 0x2
> +#define DRM_MOCK_SCHED_JOB_TIMEDOUT 0x4
> unsigned long flags;
>
> struct list_head link;
> @@ -146,6 +146,8 @@ struct drm_mock_sched_job *
> drm_mock_sched_job_new(struct kunit *test,
> struct drm_mock_sched_entity *entity);
>
> +void drm_mock_sched_job_free(struct drm_sched_job *sched_job);
> +
> /**
> * drm_mock_sched_job_submit - Arm and submit a job in one go
> *
> diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c
> b/drivers/gpu/drm/scheduler/tests/tests_basic.c
> index 7230057e0594..968b3046745f 100644
> --- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
> +++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
> @@ -241,6 +241,7 @@ static void drm_sched_basic_timeout(struct kunit
> *test)
> job->flags & DRM_MOCK_SCHED_JOB_TIMEDOUT,
> DRM_MOCK_SCHED_JOB_TIMEDOUT);
>
> + drm_mock_sched_job_free(&job->base);
> drm_mock_sched_entity_free(entity);
> }
>
> diff --git a/include/drm/gpu_scheduler.h
> b/include/drm/gpu_scheduler.h
> index 1a7e377d4cbb..0bcbc3ce8188 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -503,6 +503,26 @@ struct drm_sched_backend_ops {
> */
> enum drm_gpu_sched_stat (*timedout_job)(struct drm_sched_job
> *sched_job);
>
> + /**
> + * @cancel_job: Called during pending job cleanup on
> scheduler destroy
> + *
> + * @sched_job: The job to cancel
> + *
> + * Called from drm_sched_fini() for every job on the
> + * &drm_sched.pending_list after scheduler workqueues have
> been stopped
> + * in drm_sched_fini().
> + *
> + * Job should either be allowed to finish or revoked from
> the backend
> + * and signaled with an appropriate fence errno (-
> ECANCELED). After the
> + * callback returns scheduler will call
> + * &drm_sched_backend_ops.free_job() after which scheduler
> teardown will
> + * proceed.
> + *
> + * Callback is optional but recommended for avoiding memory
> leaks after
> + * scheduler tear down.
> + */
> + void (*cancel_job)(struct drm_sched_job *sched_job);
Tvrtko, sorry but this looks absolutely like a reimplementation of my
fix, the sole difference being that the callback has a different name.
Everything else is the same, even the check for the callback.
The only difference I might be able to see (if I try really hard) is
that you don't kill the entire fence context, but cancel job-by-job.
But let's look at it, what does it "kill the fence context" mean? It
means that all fences belonging to the context have to be signaled with
an error – precisely what you and I do, but with different names.
So I really don't get why you send this patch after I literally spend
months figuring this path forward out.
P.
> +
> /**
> * @free_job: Called once the job's finished fence has been
> signaled
> * and it's time to clean it up.
More information about the dri-devel
mailing list