[PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

Wed Jul 24 11:16:11 UTC 2024

Am 24.07.24 um 10:16 schrieb Tvrtko Ursulin:
> [SNIP]
>>
>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>> Absolutely.
>
> Absolutely good and absolutely me, or absolutely you? :)

You, I don't even have time to finish all the stuff I already started :/

>
> These are the TODO points and their opens:
>
> - Adjust amdgpu_ctx_set_entity_priority() to call 
> drm_sched_entity_modify_sched() regardless of the hw_type - to fix 
> priority changes on a single sched other than gfx or compute.

Either that or to stop using the scheduler priority to implement 
userspace priorities and always use different HW queues for that.

>
> - Document sched_list array lifetime must align with the entity and 
> adjust the callers.
>
> Open:
>
> Do you still oppose keeping sched_list for num_scheds == 1?

Not if you can fix up all callers.

> If so do you propose drm_sched_entity_modify_sched() keeps disagreeing 
> with drm_sched_entity_init() on this detail? And keep the "one shot 
> single sched_list" quirk in? Why is that nicer than simply keeping the 
> list and remove that quirk? Once lifetime rules are clear it IMO is 
> okay to always keep the list.

Yeah if every caller of drm_sched_entity_init() can be fixed I'm fine 
with that as well.

>
> - Remove drm_sched_entity_set_priority().
>
> Open:
>
> Should we at this point also modify amdgpu_device_init_schedulers() to 
> stop initialising schedulers with DRM_SCHED_PRIORITY_COUNT run queues?

One step at a time.

>
> Once drm_sched_entity_set_priority() is gone there is little point. 
> Unless there are some non-obvious games with the kernel priority or 
> something.
>
>>>>>> In general scheduler priorities were meant to be used for things 
>>>>>> like kernel queues which would always have higher priority than 
>>>>>> user space submissions and using them for userspace turned out to 
>>>>>> be not such a good idea.
>>>>>
>>>>> Out of curiousity what were the problems? I cannot think of 
>>>>> anything fundamental since there are priorities at the backend 
>>>>> level after all, just fewer levels.
>>>>
>>>> A higher level queue can starve lower level queues to the point 
>>>> that they never get a chance to get anything running.
>>>
>>> Oh that.. well I call that implementation details. :) Because 
>>> nowhere in the uapi is I think guaranteed execution ordering needs 
>>> to be strictly in descending priority. This potentially goes back to 
>>> what you said above that a potential larger rewrite might be 
>>> beneficial. Implementing some smarter scheduling. Although the issue 
>>> will be it is just frontend scheduling after all. So a bit 
>>> questionable to invest in making it too smart.
>>
>> +1 and we have a bug report complaining that RR is in at least one 
>> situation better than FIFO. So it is actually quite hard to remove.
>>
>> On the other hand FIFO has some really nice properties as well. E.g. 
>> try to run >100 glxgears instances (on weaker hw) and just visually 
>> compare the result differences between RR and FIFO. FIFO is baby 
>> smooth and RR is basically stuttering all the time.
>>
>>> I think this goes more back to what I was suggesting during early xe 
>>> days, that potentially drm scheduler should be split into dependency 
>>> handling part and the scheduler part. Drivers with 1:1 
>>> entity:scheduler and full hardware/firmware scheduling do not really 
>>> need neither fifo or rr.
>>
>> Yeah that's my thinking as well and I also suggested that multiple 
>> times in discussions with Sima and others.
>>
>>>
>>>> This basically means that userspace gets a chance to submit 
>>>> infinity fences with all the bad consequences.
>>>>
>>>>>
>>>>> I mean one problem unrelated to this discussion is this:
>>>>>
>>>>> void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
>>>>> {
>>>>>     struct dma_fence *fence;
>>>>>     struct drm_gpu_scheduler *sched;
>>>>>     struct drm_sched_rq *rq;
>>>>>
>>>>>     /* queue non-empty, stay on the same engine */
>>>>>     if (spsc_queue_count(&entity->job_queue))
>>>>>         return;
>>>>>
>>>>> Which makes it look like the entity with a constant trickle of 
>>>>> jobs will never actually manage to change it's run queue. Neither 
>>>>> today, nor after the potential drm_sched_entity_set_priority() 
>>>>> removal.
>>>>
>>>> That's intentional and based on how the scheduler load balancing 
>>>> works.
>>>
>>> I see that it is intentional but if it can silently prevent priority 
>>> changes (even hw priority) it is not very solid. Unless I am missing 
>>> something here.
>>
>> IIRC the GSoC student who implemented this (with me as mentor) 
>> actually documented that behavior somewhere.
>>
>> And to be honest it kind of makes sense because switching priorities 
>> would otherwise be disruptive, e.g. you have a moment were you need 
>> to drain all previous submissions with the old priority before you 
>> can do new ones with the new priority.
>
> Hmmm I don't see how it makes sense. Perhaps a test case for 
> AMDGPU_SCHED_OP_*_PRIORITY_OVERRIDE is missing to show how it doesn't 
> work. Or at least how easy it can be defeated with callers none the 
> wiser.

Ok, that needs a bit longer explanation. You don't by any chance have teams?

>
> For context I am kind of interested because I wired up amdgpu to the 
> DRM cgroup controller and use priority override to de-prioritize 
> certain cgroups and it kind of works. But again, it will not be great 
> if a client with a constant trickle of submissions can just defeat it.

Yeah, exactly that use case is currently not possible :(

Regards,
Christian.

>
> Regards,
>
> Tvrtko