[RFC] drm/sched: Allow drivers to register for pending list cleanup
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Tue Apr 22 10:53:52 UTC 2025
On 22/04/2025 06:43, Philipp Stanner wrote:
> On Fri, 2025-04-18 at 12:32 +0100, Tvrtko Ursulin wrote:
>> Quick sketch idea for an alternative to
>> https://lore.kernel.org/dri-devel/20250407152239.34429-2-phasta@kernel.org/
>> .
>>
>> It is possible it achieves the same effect but with less code and not
>> further growing the internal state machine. The callback it adds is
>> also
>> aligned in spirit (prototype) with other drm_sched_backend_ops
>> callbacks.
>>
>> The new callback is ->cancel_job(job) which is called after
>> drm_sched_fini() stops the workqueue for all jobs in the
>> pending_list.
>> After which, instead of waiting on the free worker to free jobs one
>> by
>> one, all are freed directly.
>>
>> As a demonstration we can remove the driver specific cleanup code
>> from the
>> mock scheduler. (End result tested for no memory leaks or crashes.)
>>
>> Please check if I am missed some gotcha etc. And if nouveau can by
>> made to
>> use it.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>> Cc: Christian König <ckoenig.leichtzumerken at gmail.com>
>> Cc: Danilo Krummrich <dakr at kernel.org>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Cc: Philipp Stanner <phasta at kernel.org>
>> ---
>> drivers/gpu/drm/scheduler/sched_main.c | 33 ++++----
>> .../gpu/drm/scheduler/tests/mock_scheduler.c | 76 ++++++++---------
>> --
>> drivers/gpu/drm/scheduler/tests/sched_tests.h | 6 +-
>> drivers/gpu/drm/scheduler/tests/tests_basic.c | 1 +
>> include/drm/gpu_scheduler.h | 20 +++++
>> 5 files changed, 77 insertions(+), 59 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 829579c41c6b..6be11befef68 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -1349,25 +1349,23 @@ int drm_sched_init(struct drm_gpu_scheduler
>> *sched, const struct drm_sched_init_
>> EXPORT_SYMBOL(drm_sched_init);
>>
>> /**
>> - * drm_sched_fini - Destroy a gpu scheduler
>> + * drm_sched_fini - Tear down and clean up the scheduler
>> *
>> * @sched: scheduler instance
>> *
>> - * Tears down and cleans up the scheduler.
>> + * In the process of tear down and cleanup this stops submission of
>> new jobs to
>> + * the hardware through drm_sched_backend_ops.run_job(), as well as
>> freeing of
>> + * completed jobs via drm_sched_backend_ops.free_job().
>> *
>> - * This stops submission of new jobs to the hardware through
>> - * drm_sched_backend_ops.run_job(). Consequently,
>> drm_sched_backend_ops.free_job()
>> - * will not be called for all jobs still in
>> drm_gpu_scheduler.pending_list.
>> - * There is no solution for this currently. Thus, it is up to the
>> driver to make
>> - * sure that:
>> + * If the driver does not implement
>> drm_sched_backend_ops.cleanup_job(), which
>> + * is recommended, drm_sched_backend_ops.free_job() will not be
>> called for all
>> + * jobs still in drm_gpu_scheduler.pending_list. In this case it is
>> up to the
>> + * driver to make sure that:
>> *
>> - * a) drm_sched_fini() is only called after for all submitted jobs
>> - * drm_sched_backend_ops.free_job() has been called or that
>> - * b) the jobs for which drm_sched_backend_ops.free_job() has not
>> been called
>> + * a) Drm_sched_fini() is only called after for all submitted jobs
>> + * drm_sched_backend_ops.free_job() has been called or that;
>> + * b) The jobs for which drm_sched_backend_ops.free_job() has not
>> been called
>> * after drm_sched_fini() ran are freed manually.
>> - *
>> - * FIXME: Take care of the above problem and prevent this function
>> from leaking
>> - * the jobs in drm_gpu_scheduler.pending_list under any
>> circumstances.
>> */
>> void drm_sched_fini(struct drm_gpu_scheduler *sched)
>> {
>> @@ -1397,6 +1395,15 @@ void drm_sched_fini(struct drm_gpu_scheduler
>> *sched)
>> /* Confirm no work left behind accessing device structures
>> */
>> cancel_delayed_work_sync(&sched->work_tdr);
>>
>> + if (sched->ops->cancel_job) {
>> + struct drm_sched_job *job;
>> +
>> + list_for_each_entry_reverse(job, &sched-
>>> pending_list, list) {
>> + sched->ops->cancel_job(job);
>> + sched->ops->free_job(job);
>> + }
>> + }
>> +
>> if (sched->own_submit_wq)
>> destroy_workqueue(sched->submit_wq);
>> sched->ready = false;
>> diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
>> b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
>> index f999c8859cf7..3c6be5ca26db 100644
>> --- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
>> +++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
>> @@ -63,7 +63,7 @@ static void drm_mock_sched_job_complete(struct
>> drm_mock_sched_job *job)
>> lockdep_assert_held(&sched->lock);
>>
>> job->flags |= DRM_MOCK_SCHED_JOB_DONE;
>> - list_move_tail(&job->link, &sched->done_list);
>> + list_del(&job->link);
>> dma_fence_signal(&job->hw_fence);
>> complete(&job->done);
>> }
>> @@ -200,29 +200,49 @@ static struct dma_fence
>> *mock_sched_run_job(struct drm_sched_job *sched_job)
>> return &job->hw_fence;
>> }
>>
>> +static void mock_sched_cancel_job(struct drm_sched_job *sched_job)
>> +{
>> + struct drm_mock_scheduler *sched =
>> + drm_sched_to_mock_sched(sched_job->sched);
>> + struct drm_mock_sched_job *job =
>> drm_sched_job_to_mock_job(sched_job);
>> +
>> + hrtimer_cancel(&job->timer);
>> +
>> + spin_lock_irq(&sched->lock);
>> + spin_lock(&job->lock);
>> + if (!dma_fence_is_signaled_locked(&job->hw_fence)) {
>> + job->flags |= DRM_MOCK_SCHED_JOB_CANCELED;
>> + list_del(&job->link);
>> + dma_fence_set_error(&job->hw_fence, -ECANCELED);
>> + dma_fence_signal_locked(&job->hw_fence);
>> + complete(&job->done);
>> + }
>> + spin_unlock(&job->lock);
>> + spin_unlock_irq(&sched->lock);
>> +}
>> +
>> static enum drm_gpu_sched_stat
>> mock_sched_timedout_job(struct drm_sched_job *sched_job)
>> {
>> + struct drm_mock_scheduler *sched =
>> + drm_sched_to_mock_sched(sched_job->sched);
>> struct drm_mock_sched_job *job =
>> drm_sched_job_to_mock_job(sched_job);
>>
>> + spin_lock_irq(&sched->lock);
>> job->flags |= DRM_MOCK_SCHED_JOB_TIMEDOUT;
>> + list_del(&job->link);
>> + dma_fence_set_error(&job->hw_fence, -ETIMEDOUT);
>> + dma_fence_signal(&job->hw_fence);
>> + spin_unlock_irq(&sched->lock);
>>
>> return DRM_GPU_SCHED_STAT_NOMINAL;
>> }
>>
>> -static void mock_sched_free_job(struct drm_sched_job *sched_job)
>> +void drm_mock_sched_job_free(struct drm_sched_job *sched_job)
>> {
>> - struct drm_mock_scheduler *sched =
>> - drm_sched_to_mock_sched(sched_job->sched);
>> struct drm_mock_sched_job *job =
>> drm_sched_job_to_mock_job(sched_job);
>> - unsigned long flags;
>>
>> - /* Remove from the scheduler done list. */
>> - spin_lock_irqsave(&sched->lock, flags);
>> - list_del(&job->link);
>> - spin_unlock_irqrestore(&sched->lock, flags);
>> dma_fence_put(&job->hw_fence);
>> -
>> drm_sched_job_cleanup(sched_job);
>>
>> /* Mock job itself is freed by the kunit framework. */
>> @@ -230,8 +250,9 @@ static void mock_sched_free_job(struct
>> drm_sched_job *sched_job)
>>
>> static const struct drm_sched_backend_ops drm_mock_scheduler_ops = {
>> .run_job = mock_sched_run_job,
>> + .cancel_job = mock_sched_cancel_job,
>> .timedout_job = mock_sched_timedout_job,
>> - .free_job = mock_sched_free_job
>> + .free_job = drm_mock_sched_job_free
>> };
>>
>> /**
>> @@ -265,7 +286,6 @@ struct drm_mock_scheduler
>> *drm_mock_sched_new(struct kunit *test, long timeout)
>> sched->hw_timeline.context = dma_fence_context_alloc(1);
>> atomic_set(&sched->hw_timeline.next_seqno, 0);
>> INIT_LIST_HEAD(&sched->job_list);
>> - INIT_LIST_HEAD(&sched->done_list);
>> spin_lock_init(&sched->lock);
>>
>> return sched;
>> @@ -280,38 +300,6 @@ struct drm_mock_scheduler
>> *drm_mock_sched_new(struct kunit *test, long timeout)
>> */
>> void drm_mock_sched_fini(struct drm_mock_scheduler *sched)
>> {
>> - struct drm_mock_sched_job *job, *next;
>> - unsigned long flags;
>> - LIST_HEAD(list);
>> -
>> - drm_sched_wqueue_stop(&sched->base);
>> -
>> - /* Force complete all unfinished jobs. */
>> - spin_lock_irqsave(&sched->lock, flags);
>> - list_for_each_entry_safe(job, next, &sched->job_list, link)
>> - list_move_tail(&job->link, &list);
>> - spin_unlock_irqrestore(&sched->lock, flags);
>> -
>> - list_for_each_entry(job, &list, link)
>> - hrtimer_cancel(&job->timer);
>> -
>> - spin_lock_irqsave(&sched->lock, flags);
>> - list_for_each_entry_safe(job, next, &list, link)
>> - drm_mock_sched_job_complete(job);
>> - spin_unlock_irqrestore(&sched->lock, flags);
>> -
>> - /*
>> - * Free completed jobs and jobs not yet processed by the DRM
>> scheduler
>> - * free worker.
>> - */
>> - spin_lock_irqsave(&sched->lock, flags);
>> - list_for_each_entry_safe(job, next, &sched->done_list, link)
>> - list_move_tail(&job->link, &list);
>> - spin_unlock_irqrestore(&sched->lock, flags);
>> -
>> - list_for_each_entry_safe(job, next, &list, link)
>> - mock_sched_free_job(&job->base);
>> -
>> drm_sched_fini(&sched->base);
>> }
>>
>> diff --git a/drivers/gpu/drm/scheduler/tests/sched_tests.h
>> b/drivers/gpu/drm/scheduler/tests/sched_tests.h
>> index 27caf8285fb7..7b4e09ad6858 100644
>> --- a/drivers/gpu/drm/scheduler/tests/sched_tests.h
>> +++ b/drivers/gpu/drm/scheduler/tests/sched_tests.h
>> @@ -49,7 +49,6 @@ struct drm_mock_scheduler {
>>
>> spinlock_t lock;
>> struct list_head job_list;
>> - struct list_head done_list;
>>
>> struct {
>> u64 context;
>> @@ -97,7 +96,8 @@ struct drm_mock_sched_job {
>> struct completion done;
>>
>> #define DRM_MOCK_SCHED_JOB_DONE 0x1
>> -#define DRM_MOCK_SCHED_JOB_TIMEDOUT 0x2
>> +#define DRM_MOCK_SCHED_JOB_CANCELED 0x2
>> +#define DRM_MOCK_SCHED_JOB_TIMEDOUT 0x4
>> unsigned long flags;
>>
>> struct list_head link;
>> @@ -146,6 +146,8 @@ struct drm_mock_sched_job *
>> drm_mock_sched_job_new(struct kunit *test,
>> struct drm_mock_sched_entity *entity);
>>
>> +void drm_mock_sched_job_free(struct drm_sched_job *sched_job);
>> +
>> /**
>> * drm_mock_sched_job_submit - Arm and submit a job in one go
>> *
>> diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c
>> b/drivers/gpu/drm/scheduler/tests/tests_basic.c
>> index 7230057e0594..968b3046745f 100644
>> --- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
>> +++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
>> @@ -241,6 +241,7 @@ static void drm_sched_basic_timeout(struct kunit
>> *test)
>> job->flags & DRM_MOCK_SCHED_JOB_TIMEDOUT,
>> DRM_MOCK_SCHED_JOB_TIMEDOUT);
>>
>> + drm_mock_sched_job_free(&job->base);
>> drm_mock_sched_entity_free(entity);
>> }
>>
>> diff --git a/include/drm/gpu_scheduler.h
>> b/include/drm/gpu_scheduler.h
>> index 1a7e377d4cbb..0bcbc3ce8188 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -503,6 +503,26 @@ struct drm_sched_backend_ops {
>> */
>> enum drm_gpu_sched_stat (*timedout_job)(struct drm_sched_job
>> *sched_job);
>>
>> + /**
>> + * @cancel_job: Called during pending job cleanup on
>> scheduler destroy
>> + *
>> + * @sched_job: The job to cancel
>> + *
>> + * Called from drm_sched_fini() for every job on the
>> + * &drm_sched.pending_list after scheduler workqueues have
>> been stopped
>> + * in drm_sched_fini().
>> + *
>> + * Job should either be allowed to finish or revoked from
>> the backend
>> + * and signaled with an appropriate fence errno (-
>> ECANCELED). After the
>> + * callback returns scheduler will call
>> + * &drm_sched_backend_ops.free_job() after which scheduler
>> teardown will
>> + * proceed.
>> + *
>> + * Callback is optional but recommended for avoiding memory
>> leaks after
>> + * scheduler tear down.
>> + */
>> + void (*cancel_job)(struct drm_sched_job *sched_job);
>
>
> Tvrtko, sorry but this looks absolutely like a reimplementation of my
> fix, the sole difference being that the callback has a different name.
>
> Everything else is the same, even the check for the callback.
>
> The only difference I might be able to see (if I try really hard) is
> that you don't kill the entire fence context, but cancel job-by-job.
>
> But let's look at it, what does it "kill the fence context" mean? It
> means that all fences belonging to the context have to be signaled with
> an error – precisely what you and I do, but with different names.
>
> So I really don't get why you send this patch after I literally spend
> months figuring this path forward out.
After I read your series and understood what it is trying to solve,
exchanged a few emails with Danilo, it felt it would be easier to
propose an alternative in code rather than in words. I thought I
explained the motivation in the cover letter. Maybe not well enough.
Let me try and re-summarise:
* It is *way* less code - no new flags no new state machine, no new
waitqueues.
* New callback is aligned with the other callbacks. It works on jobs
(as all others), does not add new terminology into the API and it does
not *mandate* drivers *must* keep internal lists. Any way to cancel a
job and scheduler core is happy.
Regards,
Tvrtko
>
> P.
>
>
>
>> +
>> /**
>> * @free_job: Called once the job's finished fence has been
>> signaled
>> * and it's time to clean it up.
>
More information about the dri-devel
mailing list