<div dir="ltr">It's very nice of you-all to finger it out that it may crash when it is the last entity in the list. Absolutely I made a mistake about that.<div>But I still cannot understand why we need to restart the selection from the list head when the current entity is removed from rq.</div><div>In drm_sched_rq_select_entity, starting from head may cause the first entity to be selected more often than others, which breaks the equal rule the scheduler wants to achieve.<br></div><div>Maybe the previous one is the better choice when current_entity == entity?</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Luben Tuikov <<a href="mailto:luben.tuikov@amd.com">luben.tuikov@amd.com</a>> 于2022年10月27日周四 16:24写道:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 2022-10-27 04:19, Christian König wrote:<br>
> Am 27.10.22 um 10:07 schrieb Luben Tuikov:<br>
>> On 2022-10-27 03:01, Luben Tuikov wrote:<br>
>>> On 2022-10-25 13:50, Luben Tuikov wrote:<br>
>>>> Looking...<br>
>>>><br>
>>>> Regards,<br>
>>>> Luben<br>
>>>><br>
>>>> On 2022-10-25 09:35, Alex Deucher wrote:<br>
>>>>> + Luben<br>
>>>>><br>
>>>>> On Tue, Oct 25, 2022 at 2:55 AM brolerliew <<a href="mailto:brolerliew@gmail.com" target="_blank">brolerliew@gmail.com</a>> wrote:<br>
>>>>>> When entity move from one rq to another, current_entity will be set to NULL<br>
>>>>>> if it is the moving entity. This make entities close to rq head got<br>
>>>>>> selected more frequently, especially when doing load balance between<br>
>>>>>> multiple drm_gpu_scheduler.<br>
>>>>>><br>
>>>>>> Make current_entity to next when removing from rq.<br>
>>>>>><br>
>>>>>> Signed-off-by: brolerliew <<a href="mailto:brolerliew@gmail.com" target="_blank">brolerliew@gmail.com</a>><br>
>>>>>> ---<br>
>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 5 +++--<br>
>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-)<br>
>>>>>><br>
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c<br>
>>>>>> index 2fab218d7082..00b22cc50f08 100644<br>
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c<br>
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c<br>
>>>>>> @@ -168,10 +168,11 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,<br>
>>>>>> spin_lock(&rq->lock);<br>
>>>>>><br>
>>>>>> atomic_dec(rq->sched->score);<br>
>>>>>> - list_del_init(&entity->list);<br>
>>>>>><br>
>>>>>> if (rq->current_entity == entity)<br>
>>>>>> - rq->current_entity = NULL;<br>
>>>>>> + rq->current_entity = list_next_entry(entity, list);<br>
>>>>>> +<br>
>>>>>> + list_del_init(&entity->list);<br>
>>>>>><br>
>>>>>> if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)<br>
>>>>>> drm_sched_rq_remove_fifo_locked(entity);<br>
>>>>>> --<br>
>>>>>> 2.34.1<br>
>>>>>><br>
>>> Looks good. I'll pick it up into some other changes I've in tow, and repost<br>
>>> along with my changes, as they're somewhat related.<br>
>> Actually, the more I look at it, the more I think that we do want to set<br>
>> rq->current_entity to NULL in that function, in order to pick the next best entity<br>
>> (or scheduler for that matter), the next time around. See sched_entity.c,<br>
>> and drm_sched_rq_select_entity() where we start evaluating from the _next_<br>
>> entity.<br>
>><br>
>> So, it is best to leave it to set it to NULL, for now.<br>
> <br>
> Apart from that this patch here could cause a crash when the entity is <br>
> the last one in the list.<br>
> <br>
> In this case current current_entity would be set to an incorrect upcast <br>
> of the head of the list.<br>
<br>
Absolutely. I saw that, but in rejecting the patch, I didn't feel the need to mention it.<br>
<br>
Thanks for looking into this.<br>
<br>
Regards,<br>
Luben<br>
<br>
</blockquote></div>