AW: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

Mon Jul 22 08:12:43 UTC 2024

[AMD Official Use Only - AMD Internal Distribution Only]

Sorry if I mangled the reply. AMDs mail servers seem to have a hickup this morning and I have to use OWA.

________________________________________
Von: Matthew Brost <matthew.brost at intel.com>
Gesendet: Freitag, 19. Juli 2024 19:39
An: Christian König
Cc: Tvrtko Ursulin; dri-devel at lists.freedesktop.org; kernel-dev at igalia.com; Tvrtko Ursulin; Koenig, Christian; Deucher, Alexander; Luben Tuikov; Daniel Vetter; amd-gfx at lists.freedesktop.org; stable at vger.kernel.org
Betreff: Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

On Fri, Jul 19, 2024 at 05:18:05PM +0200, Christian König wrote:
> Am 19.07.24 um 15:02 schrieb Christian König:
> > Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin:
> > > From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> > >
> > > Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity
> > > creation") a change was made which prevented priority changes for
> > > entities
> > > with only one assigned scheduler.
> > >
> > > The commit reduced drm_sched_entity_set_priority() to simply update the
> > > entities priority, but the run queue selection logic in
> > > drm_sched_entity_select_rq() was never able to actually change the
> > > originally assigned run queue.
> > >
> > > In practice that only affected amdgpu, being the only driver which
> > > can do
> > > dynamic priority changes. And that appears was attempted to be rectified
> > > there in 2316a86bde49 ("drm/amdgpu: change hw sched list on ctx priority
> > > override").
> > >
> > > A few unresolved problems however were that this only fixed
> > > drm_sched_entity_set_priority() *if* drm_sched_entity_modify_sched() was
> > > called first. That was not documented anywhere.
> > >
> > > Secondly, this only works if drm_sched_entity_modify_sched() is actually
> > > called, which in amdgpu's case today is true only for gfx and compute.
> > > Priority changes for other engines with only one scheduler assigned,
> > > such
> > > as jpeg and video decode will still not work.
> > >
> > > Note that this was also noticed in 981b04d96856 ("drm/sched: improve
> > > docs
> > > around drm_sched_entity").
> > >
> > > Completely different set of non-obvious confusion was that whereas
> > > drm_sched_entity_init() was not keeping the passed in list of schedulers
> > > (courtesy of 8c23056bdc7a ("drm/scheduler: do not keep a copy of sched
> > > list")), drm_sched_entity_modify_sched() was disagreeing with that and
> > > would simply assign the single item list.
> > >
> > > That incosistency appears to be semi-silently fixed in ac4eb83ab255
> > > ("drm/sched: select new rq even if there is only one v3").
> > >
> > > What was also not documented is why it was important to not keep the
> > > list of schedulers when there is only one. I suspect it could have
> > > something to do with the fact the passed in array is on stack for many
> > > callers with just one scheduler. With more than one scheduler amdgpu is
> > > the only caller, and there the container is not on the stack. Keeping a
> > > stack backed list in the entity would obviously be undefined behaviour
> > > *if* the list was kept.
> > >
> > > Amdgpu however did only stop passing in stack backed container for
> > > the more
> > > than one scheduler case in 977f7e1068be ("drm/amdgpu: allocate
> > > entities on
> > > demand"). Until then I suspect dereferencing freed stack from
> > > drm_sched_entity_select_rq() was still present.
> > >
> > > In order to untangle all that and fix priority changes this patch is
> > > bringing back the entity owned container for storing the passed in
> > > scheduler list.
> >
> > Please don't. That makes the mess just more horrible.
> >
> > The background of not keeping the array is to intentionally prevent the
> > priority override from working.
> >
> > The bug is rather that adding drm_sched_entity_modify_sched() messed
> > this up.
>
> To give more background: Amdgpu has two different ways of handling priority:
> 1. The priority in the DRM scheduler.
> 2. Different HW rings with different priorities.
>
> Your analysis is correct that drm_sched_entity_init() initially dropped the
> scheduler list to avoid using a stack allocated list, and that functionality
> is still used in amdgpu_ctx_init_entity() for example.
>
> Setting the scheduler priority was basically just a workaround because we
> didn't had the hw priorities at that time. Since that is no longer the case
> I suggest to just completely drop the drm_sched_entity_set_priority()
> function instead.
>

+1 on this idea of dropping drm_sched_entity_set_priority if it doesn't
really work and unused.

[CK]
The problem is it kind of works, but only because of coincident and not because of good engineering.

It certainly unused in Xe and Xe has HW rings with different priorities
via the GuC interface. I belive this is also true for all new drivers
based on my interactions.

[CK]
Thx, good to know.

We should not be adding complexity the scheduler without a use case.

[CK]
Completely agree. But just for the record: Tvrtko, it's really appreciated that you looked into that.

This patch here and an internal comment from Vitaly made me realize that the scheduler had another problem in the timeout handling which allows drivers to happily create infinity fences.

*sigh* going to send a patch to fix that,
Christian.

Matt

> In general scheduler priorities were meant to be used for things like kernel
> queues which would always have higher priority than user space submissions
> and using them for userspace turned out to be not such a good idea.
>
> Regards,
> Christian.
>
> >
> > Regards,
> > Christian.
> >
> >
> > > Container is now owned by the entity and the pointers are
> > > owned by the drivers. List of schedulers is always kept including
> > > for the
> > > one scheduler case.
> > >
> > > The patch can therefore also removes the single scheduler special case,
> > > which means that priority changes should now work (be able to change the
> > > selected run-queue) for all drivers and engines. In other words
> > > drm_sched_entity_set_priority() should now just work for all cases.
> > >
> > > To enable maintaining its own container some API calls needed to grow a
> > > capability for returning success/failure, which is a change which
> > > percolates mostly through amdgpu source.
> > >
> > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> > > Fixes: b3ac17667f11 ("drm/scheduler: rework entity creation")
> > > References: 8c23056bdc7a ("drm/scheduler: do not keep a copy of
> > > sched list")
> > > References: 977f7e1068be ("drm/amdgpu: allocate entities on demand")
> > > References: 2316a86bde49 ("drm/amdgpu: change hw sched list on ctx
> > > priority override")
> > > References: ac4eb83ab255 ("drm/sched: select new rq even if there is
> > > only one v3")
> > > References: 981b04d96856 ("drm/sched: improve docs around
> > > drm_sched_entity")
> > > Cc: Christian König <christian.koenig at amd.com>
> > > Cc: Alex Deucher <alexander.deucher at amd.com>
> > > Cc: Luben Tuikov <ltuikov89 at gmail.com>
> > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > Cc: amd-gfx at lists.freedesktop.org
> > > Cc: dri-devel at lists.freedesktop.org
> > > Cc: <stable at vger.kernel.org> # v5.6+
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 31 +++++---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h   |  2 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 13 +--
> > >   drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c   |  3 +-
> > >   drivers/gpu/drm/scheduler/sched_entity.c  | 96 ++++++++++++++++-------
> > >   include/drm/gpu_scheduler.h               | 16 ++--
> > >   6 files changed, 100 insertions(+), 61 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> > > index 5cb33ac99f70..387247f8307e 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> > > @@ -802,15 +802,15 @@ struct dma_fence *amdgpu_ctx_get_fence(struct
> > > amdgpu_ctx *ctx,
> > >       return fence;
> > >   }
> > >   -static void amdgpu_ctx_set_entity_priority(struct amdgpu_ctx *ctx,
> > > -                       struct amdgpu_ctx_entity *aentity,
> > > -                       int hw_ip,
> > > -                       int32_t priority)
> > > +static int amdgpu_ctx_set_entity_priority(struct amdgpu_ctx *ctx,
> > > +                      struct amdgpu_ctx_entity *aentity,
> > > +                      int hw_ip,
> > > +                      int32_t priority)
> > >   {
> > >       struct amdgpu_device *adev = ctx->mgr->adev;
> > > -    unsigned int hw_prio;
> > >       struct drm_gpu_scheduler **scheds = NULL;
> > > -    unsigned num_scheds;
> > > +    unsigned int hw_prio, num_scheds;
> > > +    int ret = 0;
> > >         /* set sw priority */
> > >       drm_sched_entity_set_priority(&aentity->entity,
> > > @@ -822,16 +822,18 @@ static void
> > > amdgpu_ctx_set_entity_priority(struct amdgpu_ctx *ctx,
> > >           hw_prio = array_index_nospec(hw_prio, AMDGPU_RING_PRIO_MAX);
> > >           scheds = adev->gpu_sched[hw_ip][hw_prio].sched;
> > >           num_scheds = adev->gpu_sched[hw_ip][hw_prio].num_scheds;
> > > -        drm_sched_entity_modify_sched(&aentity->entity, scheds,
> > > -                          num_scheds);
> > > +        ret = drm_sched_entity_modify_sched(&aentity->entity, scheds,
> > > +                            num_scheds);
> > >       }
> > > +
> > > +    return ret;
> > >   }
> > >   -void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx,
> > > -                  int32_t priority)
> > > +int amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx, int32_t
> > > priority)
> > >   {
> > >       int32_t ctx_prio;
> > >       unsigned i, j;
> > > +    int ret;
> > >         ctx->override_priority = priority;
> > >   @@ -842,10 +844,15 @@ void amdgpu_ctx_priority_override(struct
> > > amdgpu_ctx *ctx,
> > >               if (!ctx->entities[i][j])
> > >                   continue;
> > >   -            amdgpu_ctx_set_entity_priority(ctx, ctx->entities[i][j],
> > > -                               i, ctx_prio);
> > > +            ret = amdgpu_ctx_set_entity_priority(ctx,
> > > +                                 ctx->entities[i][j],
> > > +                                 i, ctx_prio);
> > > +            if (ret)
> > > +                return ret;
> > >           }
> > >       }
> > > +
> > > +    return 0;
> > >   }
> > >     int amdgpu_ctx_wait_prev_fence(struct amdgpu_ctx *ctx,
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> > > index 85376baaa92f..835661515e33 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
> > > @@ -82,7 +82,7 @@ struct dma_fence *amdgpu_ctx_get_fence(struct
> > > amdgpu_ctx *ctx,
> > >                          struct drm_sched_entity *entity,
> > >                          uint64_t seq);
> > >   bool amdgpu_ctx_priority_is_valid(int32_t ctx_prio);
> > > -void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx, int32_t
> > > ctx_prio);
> > > +int amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx, int32_t
> > > ctx_prio);
> > >     int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
> > >                struct drm_file *filp);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > > index 863b2a34b2d6..944edb7f00a2 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
> > > @@ -54,12 +54,15 @@ static int
> > > amdgpu_sched_process_priority_override(struct amdgpu_device *adev,
> > >         mgr = &fpriv->ctx_mgr;
> > >       mutex_lock(&mgr->lock);
> > > -    idr_for_each_entry(&mgr->ctx_handles, ctx, id)
> > > -        amdgpu_ctx_priority_override(ctx, priority);
> > > +    idr_for_each_entry(&mgr->ctx_handles, ctx, id) {
> > > +        r = amdgpu_ctx_priority_override(ctx, priority);
> > > +        if (r)
> > > +            break;
> > > +    }
> > >       mutex_unlock(&mgr->lock);
> > >         fdput(f);
> > > -    return 0;
> > > +    return r;
> > >   }
> > >     static int amdgpu_sched_context_priority_override(struct
> > > amdgpu_device *adev,
> > > @@ -88,11 +91,11 @@ static int
> > > amdgpu_sched_context_priority_override(struct amdgpu_device *adev,
> > >           return -EINVAL;
> > >       }
> > >   -    amdgpu_ctx_priority_override(ctx, priority);
> > > +    r = amdgpu_ctx_priority_override(ctx, priority);
> > >       amdgpu_ctx_put(ctx);
> > >       fdput(f);
> > >   -    return 0;
> > > +    return r;
> > >   }
> > >     int amdgpu_sched_ioctl(struct drm_device *dev, void *data,
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
> > > b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
> > > index 81fb99729f37..2453decc73c7 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
> > > @@ -1362,8 +1362,7 @@ static int vcn_v4_0_5_limit_sched(struct
> > > amdgpu_cs_parser *p,
> > >         scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_ENC]
> > >           [AMDGPU_RING_PRIO_0].sched;
> > > -    drm_sched_entity_modify_sched(job->base.entity, scheds, 1);
> > > -    return 0;
> > > +    return drm_sched_entity_modify_sched(job->base.entity, scheds, 1);
> > >   }
> > >     static int vcn_v4_0_5_dec_msg(struct amdgpu_cs_parser *p, struct
> > > amdgpu_job *job,
> > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> > > b/drivers/gpu/drm/scheduler/sched_entity.c
> > > index 58c8161289fe..cb5cc65f461d 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > > @@ -45,7 +45,12 @@
> > >    * @guilty: atomic_t set to 1 when a job on this queue
> > >    *          is found to be guilty causing a timeout
> > >    *
> > > - * Note that the &sched_list must have at least one element to
> > > schedule the entity.
> > > + * Note that the &sched_list must have at least one element to
> > > schedule the
> > > + * entity.
> > > + *
> > > + * The individual drm_gpu_scheduler pointers have borrow semantics, ie.
> > > + * they must remain valid during entities lifetime, while the
> > > containing
> > > + * array can be freed after this call returns.
> > >    *
> > >    * For changing @priority later on at runtime see
> > >    * drm_sched_entity_set_priority(). For changing the set of schedulers
> > > @@ -69,27 +74,24 @@ int drm_sched_entity_init(struct
> > > drm_sched_entity *entity,
> > >       INIT_LIST_HEAD(&entity->list);
> > >       entity->rq = NULL;
> > >       entity->guilty = guilty;
> > > -    entity->num_sched_list = num_sched_list;
> > >       entity->priority = priority;
> > > -    /*
> > > -     * It's perfectly valid to initialize an entity without having
> > > a valid
> > > -     * scheduler attached. It's just not valid to use the scheduler
> > > before it
> > > -     * is initialized itself.
> > > -     */
> > > -    entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
> > >       RCU_INIT_POINTER(entity->last_scheduled, NULL);
> > >       RB_CLEAR_NODE(&entity->rb_tree_node);
> > >   -    if (num_sched_list && !sched_list[0]->sched_rq) {
> > > -        /* Since every entry covered by num_sched_list
> > > -         * should be non-NULL and therefore we warn drivers
> > > -         * not to do this and to fix their DRM calling order.
> > > -         */
> > > -        pr_warn("%s: called with uninitialized scheduler\n", __func__);
> > > -    } else if (num_sched_list) {
> > > -        /* The "priority" of an entity cannot exceed the number of
> > > run-queues of a
> > > -         * scheduler. Protect against num_rqs being 0, by
> > > converting to signed. Choose
> > > -         * the lowest priority available.
> > > +    if (num_sched_list) {
> > > +        int ret;
> > > +
> > > +        ret = drm_sched_entity_modify_sched(entity,
> > > +                            sched_list,
> > > +                            num_sched_list);
> > > +        if (ret)
> > > +            return ret;
> > > +
> > > +        /*
> > > +         * The "priority" of an entity cannot exceed the number of
> > > +         * run-queues of a scheduler. Protect against num_rqs being 0,
> > > +         * by converting to signed. Choose the lowest priority
> > > +         * available.
> > >            */
> > >           if (entity->priority >= sched_list[0]->num_rqs) {
> > >               drm_err(sched_list[0], "entity with out-of-bounds
> > > priority:%u num_rqs:%u\n",
> > > @@ -122,19 +124,58 @@ EXPORT_SYMBOL(drm_sched_entity_init);
> > >    *         existing entity->sched_list
> > >    * @num_sched_list: number of drm sched in sched_list
> > >    *
> > > + * The individual drm_gpu_scheduler pointers have borrow semantics, ie.
> > > + * they must remain valid during entities lifetime, while the
> > > containing
> > > + * array can be freed after this call returns.
> > > + *
> > >    * Note that this must be called under the same common lock for
> > > @entity as
> > >    * drm_sched_job_arm() and drm_sched_entity_push_job(), or the
> > > driver needs to
> > >    * guarantee through some other means that this is never called
> > > while new jobs
> > >    * can be pushed to @entity.
> > > + *
> > > + * Returns zero on success and a negative error code on failure.
> > >    */
> > > -void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> > > -                    struct drm_gpu_scheduler **sched_list,
> > > -                    unsigned int num_sched_list)
> > > +int drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> > > +                  struct drm_gpu_scheduler **sched_list,
> > > +                  unsigned int num_sched_list)
> > >   {
> > > -    WARN_ON(!num_sched_list || !sched_list);
> > > +    struct drm_gpu_scheduler **new, **old;
> > > +    unsigned int i;
> > >   -    entity->sched_list = sched_list;
> > > +    if (!(entity && sched_list && (num_sched_list == 0 ||
> > > sched_list[0])))
> > > +        return -EINVAL;
> > > +
> > > +    /*
> > > +     * It's perfectly valid to initialize an entity without having
> > > a valid
> > > +     * scheduler attached. It's just not valid to use the scheduler
> > > before
> > > +     * it is initialized itself.
> > > +     *
> > > +     * Since every entry covered by num_sched_list should be
> > > non-NULL and
> > > +     * therefore we warn drivers not to do this and to fix their
> > > DRM calling
> > > +     * order.
> > > +     */
> > > +    for (i = 0; i < num_sched_list; i++) {
> > > +        if (!sched_list[i]) {
> > > +            pr_warn("%s: called with uninitialized scheduler %u!\n",
> > > +                __func__, i);
> > > +            return -EINVAL;
> > > +        }
> > > +    }
> > > +
> > > +    new = kmemdup_array(sched_list,
> > > +                num_sched_list,
> > > +                sizeof(*sched_list),
> > > +                GFP_KERNEL);
> > > +    if (!new)
> > > +        return -ENOMEM;
> > > +
> > > +    old = entity->sched_list;
> > > +    entity->sched_list = new;
> > >       entity->num_sched_list = num_sched_list;
> > > +
> > > +    kfree(old);
> > > +
> > > +    return 0;
> > >   }
> > >   EXPORT_SYMBOL(drm_sched_entity_modify_sched);
> > >   @@ -341,6 +382,8 @@ void drm_sched_entity_fini(struct
> > > drm_sched_entity *entity)
> > > dma_fence_put(rcu_dereference_check(entity->last_scheduled, true));
> > >       RCU_INIT_POINTER(entity->last_scheduled, NULL);
> > > +
> > > +    kfree(entity->sched_list);
> > >   }
> > >   EXPORT_SYMBOL(drm_sched_entity_fini);
> > >   @@ -531,10 +574,6 @@ void drm_sched_entity_select_rq(struct
> > > drm_sched_entity *entity)
> > >       struct drm_gpu_scheduler *sched;
> > >       struct drm_sched_rq *rq;
> > >   -    /* single possible engine and already selected */
> > > -    if (!entity->sched_list)
> > > -        return;
> > > -
> > >       /* queue non-empty, stay on the same engine */
> > >       if (spsc_queue_count(&entity->job_queue))
> > >           return;
> > > @@ -561,9 +600,6 @@ void drm_sched_entity_select_rq(struct
> > > drm_sched_entity *entity)
> > >           entity->rq = rq;
> > >       }
> > >       spin_unlock(&entity->rq_lock);
> > > -
> > > -    if (entity->num_sched_list == 1)
> > > -        entity->sched_list = NULL;
> > >   }
> > >     /**
> > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > > index 5acc64954a88..09e1d063a5c0 100644
> > > --- a/include/drm/gpu_scheduler.h
> > > +++ b/include/drm/gpu_scheduler.h
> > > @@ -110,18 +110,12 @@ struct drm_sched_entity {
> > >       /**
> > >        * @sched_list:
> > >        *
> > > -     * A list of schedulers (struct drm_gpu_scheduler).  Jobs from
> > > this entity can
> > > -     * be scheduled on any scheduler on this list.
> > > +     * A list of schedulers (struct drm_gpu_scheduler).  Jobs from this
> > > +     * entity can be scheduled on any scheduler on this list.
> > >        *
> > >        * This can be modified by calling
> > > drm_sched_entity_modify_sched().
> > >        * Locking is entirely up to the driver, see the above
> > > function for more
> > >        * details.
> > > -     *
> > > -     * This will be set to NULL if &num_sched_list equals 1 and @rq
> > > has been
> > > -     * set already.
> > > -     *
> > > -     * FIXME: This means priority changes through
> > > -     * drm_sched_entity_set_priority() will be lost henceforth in
> > > this case.
> > >        */
> > >       struct drm_gpu_scheduler        **sched_list;
> > >   @@ -568,9 +562,9 @@ int
> > > drm_sched_job_add_implicit_dependencies(struct drm_sched_job *job,
> > >                           bool write);
> > >     -void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> > > -                    struct drm_gpu_scheduler **sched_list,
> > > -                                   unsigned int num_sched_list);
> > > +int drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> > > +                  struct drm_gpu_scheduler **sched_list,
> > > +                  unsigned int num_sched_list);
> > >     void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
> > >   void drm_sched_job_cleanup(struct drm_sched_job *job);
> >
>