[PATCH v4 04/40] drm/sched: Add enqueue credit limit
Rob Clark
robdclark at chromium.org
Thu May 15 16:15:08 UTC 2025
On Thu, May 15, 2025 at 2:28 AM Philipp Stanner <phasta at mailbox.org> wrote:
>
> Hello,
>
> On Wed, 2025-05-14 at 09:59 -0700, Rob Clark wrote:
> > From: Rob Clark <robdclark at chromium.org>
> >
> > Similar to the existing credit limit mechanism, but applying to jobs
> > enqueued to the scheduler but not yet run.
> >
> > The use case is to put an upper bound on preallocated, and
> > potentially
> > unneeded, pgtable pages. When this limit is exceeded, pushing new
> > jobs
> > will block until the count drops below the limit.
>
> the commit message doesn't make clear why that's needed within the
> scheduler.
>
> From what I understand from the cover letter, this is a (rare?) Vulkan
> feature. And as important as Vulkan is, it's the drivers that implement
> support for it. I don't see why the scheduler is a blocker.
Maybe not rare, or at least it comes up with a group of deqp-vk tests ;-)
Basically it is a way to throttle userspace to prevent it from OoM'ing
itself. (I suppose userspace could throttle itself, but it doesn't
really know how much pre-allocation will need to be done for pgtable
updates.)
> All the knowledge about when to stop pushing into the entity is in the
> driver, and the scheduler obtains all the knowledge about that from the
> driver anyways.
>
> So you could do
>
> if (my_vulkan_condition())
> drm_sched_entity_push_job();
>
> couldn't you?
It would need to reach in and use the sched's job_scheduled
wait_queue_head_t... if that isn't too ugly, maybe the rest could be
implemented on top of sched. But it seemed like a reasonable thing
for the scheduler to support directly.
> >
> > Signed-off-by: Rob Clark <robdclark at chromium.org>
> > ---
> > drivers/gpu/drm/scheduler/sched_entity.c | 16 ++++++++++++++--
> > drivers/gpu/drm/scheduler/sched_main.c | 3 +++
> > include/drm/gpu_scheduler.h | 13 ++++++++++++-
> > 3 files changed, 29 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> > b/drivers/gpu/drm/scheduler/sched_entity.c
> > index dc0e60d2c14b..c5f688362a34 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -580,11 +580,21 @@ void drm_sched_entity_select_rq(struct
> > drm_sched_entity *entity)
> > * under common lock for the struct drm_sched_entity that was set up
> > for
> > * @sched_job in drm_sched_job_init().
> > */
> > -void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
> > +int drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>
> Return code would need to be documented in the docstring, too. If we'd
> go for that solution.
>
> > {
> > struct drm_sched_entity *entity = sched_job->entity;
> > + struct drm_gpu_scheduler *sched = sched_job->sched;
> > bool first;
> > ktime_t submit_ts;
> > + int ret;
> > +
> > + ret = wait_event_interruptible(
> > + sched->job_scheduled,
> > + atomic_read(&sched->enqueue_credit_count) <=
> > + sched->enqueue_credit_limit);
>
> This very significantly changes the function's semantics. This function
> is used in a great many drivers, and here it would be transformed into
> a function that can block.
>
> From what I see below those credits are to be optional. But even if, it
> needs to be clearly documented when a function can block.
Sure. The behavior changes only for drivers that use the
enqueue_credit_limit, so other drivers should be unaffected.
I can improve the docs.
(Maybe push_credit or something else would be a better name than
enqueue_credit?)
>
> > + if (ret)
> > + return ret;
> > + atomic_add(sched_job->enqueue_credits, &sched-
> > >enqueue_credit_count);
> >
> > trace_drm_sched_job(sched_job, entity);
> > atomic_inc(entity->rq->sched->score);
> > @@ -609,7 +619,7 @@ void drm_sched_entity_push_job(struct
> > drm_sched_job *sched_job)
> > spin_unlock(&entity->lock);
> >
> > DRM_ERROR("Trying to push to a killed
> > entity\n");
> > - return;
> > + return -EINVAL;
> > }
> >
> > rq = entity->rq;
> > @@ -626,5 +636,7 @@ void drm_sched_entity_push_job(struct
> > drm_sched_job *sched_job)
> >
> > drm_sched_wakeup(sched);
> > }
> > +
> > + return 0;
> > }
> > EXPORT_SYMBOL(drm_sched_entity_push_job);
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 9412bffa8c74..1102cca69cb4 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1217,6 +1217,7 @@ static void drm_sched_run_job_work(struct
> > work_struct *w)
> >
> > trace_drm_run_job(sched_job, entity);
> > fence = sched->ops->run_job(sched_job);
> > + atomic_sub(sched_job->enqueue_credits, &sched-
> > >enqueue_credit_count);
> > complete_all(&entity->entity_idle);
> > drm_sched_fence_scheduled(s_fence, fence);
> >
> > @@ -1253,6 +1254,7 @@ int drm_sched_init(struct drm_gpu_scheduler
> > *sched, const struct drm_sched_init_
> >
> > sched->ops = args->ops;
> > sched->credit_limit = args->credit_limit;
> > + sched->enqueue_credit_limit = args->enqueue_credit_limit;
> > sched->name = args->name;
> > sched->timeout = args->timeout;
> > sched->hang_limit = args->hang_limit;
> > @@ -1308,6 +1310,7 @@ int drm_sched_init(struct drm_gpu_scheduler
> > *sched, const struct drm_sched_init_
> > INIT_LIST_HEAD(&sched->pending_list);
> > spin_lock_init(&sched->job_list_lock);
> > atomic_set(&sched->credit_count, 0);
> > + atomic_set(&sched->enqueue_credit_count, 0);
> > INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
> > INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
> > INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
> > diff --git a/include/drm/gpu_scheduler.h
> > b/include/drm/gpu_scheduler.h
> > index da64232c989d..d830ffe083f1 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -329,6 +329,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct
> > dma_fence *f);
> > * @s_fence: contains the fences for the scheduling of job.
> > * @finish_cb: the callback for the finished fence.
> > * @credits: the number of credits this job contributes to the
> > scheduler
> > + * @enqueue_credits: the number of enqueue credits this job
> > contributes
> > * @work: Helper to reschedule job kill to different context.
> > * @id: a unique id assigned to each job scheduled on the scheduler.
> > * @karma: increment on every hang caused by this job. If this
> > exceeds the hang
> > @@ -366,6 +367,7 @@ struct drm_sched_job {
> >
> > enum drm_sched_priority s_priority;
> > u32 credits;
> > + u32 enqueue_credits;
>
> What's the policy of setting this?
>
> drm_sched_job_init() and drm_sched_job_arm() are responsible for
> initializing jobs.
It should be set before drm_sched_entity_push_job(). I wouldn't
really expect drivers to know the value at drm_sched_job_init() time.
But they would by the time drm_sched_entity_push_job() is called.
> > /** @last_dependency: tracks @dependencies as they signal */
> > unsigned int last_dependency;
> > atomic_t karma;
> > @@ -485,6 +487,10 @@ struct drm_sched_backend_ops {
> > * @ops: backend operations provided by the driver.
> > * @credit_limit: the credit limit of this scheduler
> > * @credit_count: the current credit count of this scheduler
> > + * @enqueue_credit_limit: the credit limit of jobs pushed to
> > scheduler and not
> > + * yet run
> > + * @enqueue_credit_count: the current crdit count of jobs pushed to
> > scheduler
> > + * but not yet run
> > * @timeout: the time after which a job is removed from the
> > scheduler.
> > * @name: name of the ring for which this scheduler is being used.
> > * @num_rqs: Number of run-queues. This is at most
> > DRM_SCHED_PRIORITY_COUNT,
> > @@ -518,6 +524,8 @@ struct drm_gpu_scheduler {
> > const struct drm_sched_backend_ops *ops;
> > u32 credit_limit;
> > atomic_t credit_count;
> > + u32 enqueue_credit_limit;
> > + atomic_t enqueue_credit_count;
> > long timeout;
> > const char *name;
> > u32 num_rqs;
> > @@ -550,6 +558,8 @@ struct drm_gpu_scheduler {
> > * @num_rqs: Number of run-queues. This may be at most
> > DRM_SCHED_PRIORITY_COUNT,
> > * as there's usually one run-queue per priority, but may
> > be less.
> > * @credit_limit: the number of credits this scheduler can hold from
> > all jobs
> > + * @enqueue_credit_limit: the number of credits that can be enqueued
> > before
> > + * drm_sched_entity_push_job() blocks
>
> Is it optional or not? Can it be deactivated?
>
> It seems to me that it is optional, and so far only used in msm. If
> there are no other parties in need for that mechanism, the right place
> to have this feature probably is msm, which has all the knowledge about
> when to block already.
>
As with the existing credit_limit, it is optional. Although I think
it would be also useful for other drivers that use drm sched for
VM_BIND queues, for the same reason.
BR,
-R
>
> Regards
> P.
>
>
> > * @hang_limit: number of times to allow a job to hang before
> > dropping it.
> > * This mechanism is DEPRECATED. Set it to 0.
> > * @timeout: timeout value in jiffies for submitted jobs.
> > @@ -564,6 +574,7 @@ struct drm_sched_init_args {
> > struct workqueue_struct *timeout_wq;
> > u32 num_rqs;
> > u32 credit_limit;
> > + u32 enqueue_credit_limit;
> > unsigned int hang_limit;
> > long timeout;
> > atomic_t *score;
> > @@ -600,7 +611,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> > struct drm_sched_entity *entity,
> > u32 credits, void *owner);
> > void drm_sched_job_arm(struct drm_sched_job *job);
> > -void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
> > +int drm_sched_entity_push_job(struct drm_sched_job *sched_job);
> > int drm_sched_job_add_dependency(struct drm_sched_job *job,
> > struct dma_fence *fence);
> > int drm_sched_job_add_syncobj_dependency(struct drm_sched_job *job,
>
More information about the dri-devel
mailing list