[PATCH v4 04/40] drm/sched: Add enqueue credit limit

Thu May 15 16:15:08 UTC 2025

On Thu, May 15, 2025 at 2:28 AM Philipp Stanner <phasta at mailbox.org> wrote:
>
> Hello,
>
> On Wed, 2025-05-14 at 09:59 -0700, Rob Clark wrote:
> > From: Rob Clark <robdclark at chromium.org>
> >
> > Similar to the existing credit limit mechanism, but applying to jobs
> > enqueued to the scheduler but not yet run.
> >
> > The use case is to put an upper bound on preallocated, and
> > potentially
> > unneeded, pgtable pages.  When this limit is exceeded, pushing new
> > jobs
> > will block until the count drops below the limit.
>
> the commit message doesn't make clear why that's needed within the
> scheduler.
>
> From what I understand from the cover letter, this is a (rare?) Vulkan
> feature. And as important as Vulkan is, it's the drivers that implement
> support for it. I don't see why the scheduler is a blocker.

Maybe not rare, or at least it comes up with a group of deqp-vk tests ;-)

Basically it is a way to throttle userspace to prevent it from OoM'ing
itself.  (I suppose userspace could throttle itself, but it doesn't
really know how much pre-allocation will need to be done for pgtable
updates.)

> All the knowledge about when to stop pushing into the entity is in the
> driver, and the scheduler obtains all the knowledge about that from the
> driver anyways.
>
> So you could do
>
> if (my_vulkan_condition())
>    drm_sched_entity_push_job();
>
> couldn't you?

It would need to reach in and use the sched's job_scheduled
wait_queue_head_t...  if that isn't too ugly, maybe the rest could be
implemented on top of sched.  But it seemed like a reasonable thing
for the scheduler to support directly.

> >
> > Signed-off-by: Rob Clark <robdclark at chromium.org>
> > ---
> >  drivers/gpu/drm/scheduler/sched_entity.c | 16 ++++++++++++++--
> >  drivers/gpu/drm/scheduler/sched_main.c   |  3 +++
> >  include/drm/gpu_scheduler.h              | 13 ++++++++++++-
> >  3 files changed, 29 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> > b/drivers/gpu/drm/scheduler/sched_entity.c
> > index dc0e60d2c14b..c5f688362a34 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -580,11 +580,21 @@ void drm_sched_entity_select_rq(struct
> > drm_sched_entity *entity)
> >   * under common lock for the struct drm_sched_entity that was set up
> > for
> >   * @sched_job in drm_sched_job_init().
> >   */
> > -void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
> > +int drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>
> Return code would need to be documented in the docstring, too. If we'd
> go for that solution.
>
> >  {
> >       struct drm_sched_entity *entity = sched_job->entity;
> > +     struct drm_gpu_scheduler *sched = sched_job->sched;
> >       bool first;
> >       ktime_t submit_ts;
> > +     int ret;
> > +
> > +     ret = wait_event_interruptible(
> > +                     sched->job_scheduled,
> > +                     atomic_read(&sched->enqueue_credit_count) <=
> > +                     sched->enqueue_credit_limit);
>
> This very significantly changes the function's semantics. This function
> is used in a great many drivers, and here it would be transformed into
> a function that can block.
>
> From what I see below those credits are to be optional. But even if, it
> needs to be clearly documented when a function can block.

Sure.  The behavior changes only for drivers that use the
enqueue_credit_limit, so other drivers should be unaffected.

I can improve the docs.

(Maybe push_credit or something else would be a better name than
enqueue_credit?)

>
> > +     if (ret)
> > +             return ret;
> > +     atomic_add(sched_job->enqueue_credits, &sched-
> > >enqueue_credit_count);
> >
> >       trace_drm_sched_job(sched_job, entity);
> >       atomic_inc(entity->rq->sched->score);
> > @@ -609,7 +619,7 @@ void drm_sched_entity_push_job(struct
> > drm_sched_job *sched_job)
> >                       spin_unlock(&entity->lock);
> >
> >                       DRM_ERROR("Trying to push to a killed
> > entity\n");
> > -                     return;
> > +                     return -EINVAL;
> >               }
> >
> >               rq = entity->rq;
> > @@ -626,5 +636,7 @@ void drm_sched_entity_push_job(struct
> > drm_sched_job *sched_job)
> >
> >               drm_sched_wakeup(sched);
> >       }
> > +
> > +     return 0;
> >  }
> >  EXPORT_SYMBOL(drm_sched_entity_push_job);
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 9412bffa8c74..1102cca69cb4 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1217,6 +1217,7 @@ static void drm_sched_run_job_work(struct
> > work_struct *w)
> >
> >       trace_drm_run_job(sched_job, entity);
> >       fence = sched->ops->run_job(sched_job);
> > +     atomic_sub(sched_job->enqueue_credits, &sched-
> > >enqueue_credit_count);
> >       complete_all(&entity->entity_idle);
> >       drm_sched_fence_scheduled(s_fence, fence);
> >
> > @@ -1253,6 +1254,7 @@ int drm_sched_init(struct drm_gpu_scheduler
> > *sched, const struct drm_sched_init_
> >
> >       sched->ops = args->ops;
> >       sched->credit_limit = args->credit_limit;
> > +     sched->enqueue_credit_limit = args->enqueue_credit_limit;
> >       sched->name = args->name;
> >       sched->timeout = args->timeout;
> >       sched->hang_limit = args->hang_limit;
> > @@ -1308,6 +1310,7 @@ int drm_sched_init(struct drm_gpu_scheduler
> > *sched, const struct drm_sched_init_
> >       INIT_LIST_HEAD(&sched->pending_list);
> >       spin_lock_init(&sched->job_list_lock);
> >       atomic_set(&sched->credit_count, 0);
> > +     atomic_set(&sched->enqueue_credit_count, 0);
> >       INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
> >       INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
> >       INIT_WORK(&sched->work_free_job, drm_sched_free_job_work);
> > diff --git a/include/drm/gpu_scheduler.h
> > b/include/drm/gpu_scheduler.h
> > index da64232c989d..d830ffe083f1 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -329,6 +329,7 @@ struct drm_sched_fence *to_drm_sched_fence(struct
> > dma_fence *f);
> >   * @s_fence: contains the fences for the scheduling of job.
> >   * @finish_cb: the callback for the finished fence.
> >   * @credits: the number of credits this job contributes to the
> > scheduler
> > + * @enqueue_credits: the number of enqueue credits this job
> > contributes
> >   * @work: Helper to reschedule job kill to different context.
> >   * @id: a unique id assigned to each job scheduled on the scheduler.
> >   * @karma: increment on every hang caused by this job. If this
> > exceeds the hang
> > @@ -366,6 +367,7 @@ struct drm_sched_job {
> >
> >       enum drm_sched_priority         s_priority;
> >       u32                             credits;
> > +     u32                             enqueue_credits;
>
> What's the policy of setting this?
>
> drm_sched_job_init() and drm_sched_job_arm() are responsible for
> initializing jobs.

It should be set before drm_sched_entity_push_job().  I wouldn't
really expect drivers to know the value at drm_sched_job_init() time.
But they would by the time drm_sched_entity_push_job() is called.

> >       /** @last_dependency: tracks @dependencies as they signal */
> >       unsigned int                    last_dependency;
> >       atomic_t                        karma;
> > @@ -485,6 +487,10 @@ struct drm_sched_backend_ops {
> >   * @ops: backend operations provided by the driver.
> >   * @credit_limit: the credit limit of this scheduler
> >   * @credit_count: the current credit count of this scheduler
> > + * @enqueue_credit_limit: the credit limit of jobs pushed to
> > scheduler and not
> > + *                        yet run
> > + * @enqueue_credit_count: the current crdit count of jobs pushed to
> > scheduler
> > + *                        but not yet run
> >   * @timeout: the time after which a job is removed from the
> > scheduler.
> >   * @name: name of the ring for which this scheduler is being used.
> >   * @num_rqs: Number of run-queues. This is at most
> > DRM_SCHED_PRIORITY_COUNT,
> > @@ -518,6 +524,8 @@ struct drm_gpu_scheduler {
> >       const struct drm_sched_backend_ops      *ops;
> >       u32                             credit_limit;
> >       atomic_t                        credit_count;
> > +     u32                             enqueue_credit_limit;
> > +     atomic_t                        enqueue_credit_count;
> >       long                            timeout;
> >       const char                      *name;
> >       u32                             num_rqs;
> > @@ -550,6 +558,8 @@ struct drm_gpu_scheduler {
> >   * @num_rqs: Number of run-queues. This may be at most
> > DRM_SCHED_PRIORITY_COUNT,
> >   *        as there's usually one run-queue per priority, but may
> > be less.
> >   * @credit_limit: the number of credits this scheduler can hold from
> > all jobs
> > + * @enqueue_credit_limit: the number of credits that can be enqueued
> > before
> > + *                        drm_sched_entity_push_job() blocks
>
> Is it optional or not? Can it be deactivated?
>
> It seems to me that it is optional, and so far only used in msm. If
> there are no other parties in need for that mechanism, the right place
> to have this feature probably is msm, which has all the knowledge about
> when to block already.
>

As with the existing credit_limit, it is optional.  Although I think
it would be also useful for other drivers that use drm sched for
VM_BIND queues, for the same reason.

BR,
-R

>
> Regards
> P.
>
>
> >   * @hang_limit: number of times to allow a job to hang before
> > dropping it.
> >   *           This mechanism is DEPRECATED. Set it to 0.
> >   * @timeout: timeout value in jiffies for submitted jobs.
> > @@ -564,6 +574,7 @@ struct drm_sched_init_args {
> >       struct workqueue_struct *timeout_wq;
> >       u32 num_rqs;
> >       u32 credit_limit;
> > +     u32 enqueue_credit_limit;
> >       unsigned int hang_limit;
> >       long timeout;
> >       atomic_t *score;
> > @@ -600,7 +611,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
> >                      struct drm_sched_entity *entity,
> >                      u32 credits, void *owner);
> >  void drm_sched_job_arm(struct drm_sched_job *job);
> > -void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
> > +int drm_sched_entity_push_job(struct drm_sched_job *sched_job);
> >  int drm_sched_job_add_dependency(struct drm_sched_job *job,
> >                                struct dma_fence *fence);
> >  int drm_sched_job_add_syncobj_dependency(struct drm_sched_job *job,
>