<div dir="ltr">It's very nice of you-all to finger it out that it may crash when it is the last entity in the list. Absolutely I made a mistake about that.<div>But I still cannot understand why we need to restart the selection from the list head when the current entity is removed from rq.</div><div>In drm_sched_rq_select_entity, starting from head may cause the first entity to be selected more often than others, which breaks the equal rule the scheduler wants to achieve. </div><div>Maybe the previous one is the better choice when current_entity == entity?</div></div> <div class="gmail_quote"><div dir="ltr" class="gmail_attr">Luben Tuikov <<a href="mailto:luben.tuikov@amd.com">luben.tuikov@amd.com</a>> 于2022年10月27日周四 16:24写道： </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 2022-10-27 04:19, Christian König wrote: > Am 27.10.22 um 10:07 schrieb Luben Tuikov: >> On 2022-10-27 03:01, Luben Tuikov wrote: >>> On 2022-10-25 13:50, Luben Tuikov wrote: >>>> Looking... >>>> >>>> Regards, >>>> Luben >>>> >>>> On 2022-10-25 09:35, Alex Deucher wrote: >>>>> + Luben >>>>> >>>>> On Tue, Oct 25, 2022 at 2:55 AM brolerliew <<a href="mailto:brolerliew@gmail.com" target="_blank">brolerliew@gmail.com</a>> wrote: >>>>>> When entity move from one rq to another, current_entity will be set to NULL >>>>>> if it is the moving entity. This make entities close to rq head got >>>>>> selected more frequently, especially when doing load balance between >>>>>> multiple drm_gpu_scheduler. >>>>>> >>>>>> Make current_entity to next when removing from rq. >>>>>> >>>>>> Signed-off-by: brolerliew <<a href="mailto:brolerliew@gmail.com" target="_blank">brolerliew@gmail.com</a>> >>>>>> --- >>>>>> drivers/gpu/drm/scheduler/sched_main.c | 5 +++-- >>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>>>>> index 2fab218d7082..00b22cc50f08 100644 >>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>>> @@ -168,10 +168,11 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>>>>> spin_lock(&rq->lock); >>>>>> >>>>>> atomic_dec(rq->sched->score); >>>>>> - list_del_init(&entity->list); >>>>>> >>>>>> if (rq->current_entity == entity) >>>>>> - rq->current_entity = NULL; >>>>>> + rq->current_entity = list_next_entry(entity, list); >>>>>> + >>>>>> + list_del_init(&entity->list); >>>>>> >>>>>> if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) >>>>>> drm_sched_rq_remove_fifo_locked(entity); >>>>>> -- >>>>>> 2.34.1 >>>>>> >>> Looks good. I'll pick it up into some other changes I've in tow, and repost >>> along with my changes, as they're somewhat related. >> Actually, the more I look at it, the more I think that we do want to set >> rq->current_entity to NULL in that function, in order to pick the next best entity >> (or scheduler for that matter), the next time around. See sched_entity.c, >> and drm_sched_rq_select_entity() where we start evaluating from the _next_ >> entity. >> >> So, it is best to leave it to set it to NULL, for now. > > Apart from that this patch here could cause a crash when the entity is > the last one in the list. > > In this case current current_entity would be set to an incorrect upcast > of the head of the list. Absolutely. I saw that, but in rejecting the patch, I didn't feel the need to mention it. Thanks for looking into this. Regards, Luben </blockquote></div>