[PATCH] drm/scheduler: set current_entity to next when remove from rq

Thu Oct 27 09:00:48 UTC 2022

It's very nice of you-all to finger it out that it may crash when it is the
last entity in the list.   Absolutely I made a mistake about that.
But I still cannot understand why we need to restart the selection from the
list head when the current entity is removed from rq.
In drm_sched_rq_select_entity, starting from head may cause the first
entity to be selected more often than others, which breaks the equal rule
the scheduler wants to achieve.
Maybe the previous one is the better choice when current_entity == entity?

Luben Tuikov <luben.tuikov at amd.com> 于2022年10月27日周四 16:24写道：

> On 2022-10-27 04:19, Christian König wrote:
> > Am 27.10.22 um 10:07 schrieb Luben Tuikov:
> >> On 2022-10-27 03:01, Luben Tuikov wrote:
> >>> On 2022-10-25 13:50, Luben Tuikov wrote:
> >>>> Looking...
> >>>>
> >>>> Regards,
> >>>> Luben
> >>>>
> >>>> On 2022-10-25 09:35, Alex Deucher wrote:
> >>>>> + Luben
> >>>>>
> >>>>> On Tue, Oct 25, 2022 at 2:55 AM brolerliew <brolerliew at gmail.com>
> wrote:
> >>>>>> When entity move from one rq to another, current_entity will be set
> to NULL
> >>>>>> if it is the moving entity. This make entities close to rq head got
> >>>>>> selected more frequently, especially when doing load balance between
> >>>>>> multiple drm_gpu_scheduler.
> >>>>>>
> >>>>>> Make current_entity to next when removing from rq.
> >>>>>>
> >>>>>> Signed-off-by: brolerliew <brolerliew at gmail.com>
> >>>>>> ---
> >>>>>>   drivers/gpu/drm/scheduler/sched_main.c | 5 +++--
> >>>>>>   1 file changed, 3 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>> index 2fab218d7082..00b22cc50f08 100644
> >>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>>> @@ -168,10 +168,11 @@ void drm_sched_rq_remove_entity(struct
> drm_sched_rq *rq,
> >>>>>>          spin_lock(&rq->lock);
> >>>>>>
> >>>>>>          atomic_dec(rq->sched->score);
> >>>>>> -       list_del_init(&entity->list);
> >>>>>>
> >>>>>>          if (rq->current_entity == entity)
> >>>>>> -               rq->current_entity = NULL;
> >>>>>> +               rq->current_entity = list_next_entry(entity, list);
> >>>>>> +
> >>>>>> +       list_del_init(&entity->list);
> >>>>>>
> >>>>>>          if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> >>>>>>                  drm_sched_rq_remove_fifo_locked(entity);
> >>>>>> --
> >>>>>> 2.34.1
> >>>>>>
> >>> Looks good. I'll pick it up into some other changes I've in tow, and
> repost
> >>> along with my changes, as they're somewhat related.
> >> Actually, the more I look at it, the more I think that we do want to set
> >> rq->current_entity to NULL in that function, in order to pick the next
> best entity
> >> (or scheduler for that matter), the next time around. See
> sched_entity.c,
> >> and drm_sched_rq_select_entity() where we start evaluating from the
> _next_
> >> entity.
> >>
> >> So, it is best to leave it to set it to NULL, for now.
> >
> > Apart from that this patch here could cause a crash when the entity is
> > the last one in the list.
> >
> > In this case current current_entity would be set to an incorrect upcast
> > of the head of the list.
>
> Absolutely. I saw that, but in rejecting the patch, I didn't feel the need
> to mention it.
>
> Thanks for looking into this.
>
> Regards,
> Luben
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20221027/21e00749/attachment-0001.htm>