[RFC v3 00/14] Deadline DRM scheduler
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Wed Apr 2 08:26:22 UTC 2025
On 02/04/2025 07:49, Christian König wrote:
> Adding Leo since that is especially interesting for our multimedia engines.
>
> @Leo could you spare someone to test and maybe review this?
>
> Am 31.03.25 um 22:16 schrieb Tvrtko Ursulin:
>> This is similar to v2 but I dropped some patches (for now) and added some new
>> ones. Most notably deadline scaling based on queue depth appears to be able to
>> add a little bit of fairness with spammy clients (deep submission queue).
>>
>> As such, on the high level main advantages of the series:
>>
>> 1. Code simplification - no more multiple run queues.
>> 2. Scheduling quality - schedules better than FIFO.
>> 3. No more RR is even more code simplification but this one needs to be tested
>> and approved by someone who actually uses RR.
>>
>> In the future futher simplifactions and improvements should be possible on top
>> of this work. But for now I keep it simple.
>>
>> First patch adds some unit tests which allow for easy evaluation of scheduling
>> behaviour against different client submission patterns. From there onwards it is
>> a hopefully natural progression of patches (or close) to the end result which is
>> a slightly more fair scheduler than FIFO.
>>
>> Regarding the submission patterns tested, it is always two parallel clients
>> and they broadly cover these categories:
>>
>> * Deep queue clients
>> * Hogs versus interactive
>> * Priority handling
>
> First of all, impressive piece of work.
Thank you!
I am not super happy though, since what would be much better is some
sort of a CFS. But to do that would require to crack the entity GPU time
tracking problem. That I tried two times so far and failed to find a
generic, elegant and not too intrusive solution.
>> Lets look at the results:
>>
>> 1. Two normal priority deep queue clients.
>>
>> These ones submit one second worth of 8ms jobs. As fast as they can, no
>> dependencies etc. There is no difference in runtime between FIFO and qddl but
>> the latter allows both clients to progress with work more evenly:
>>
>> https://people.igalia.com/tursulin/drm-sched-qddl/normal-normal.png
>>
>> (X axis is time, Y is submitted queue-depth, hence lowering of qd corresponds
>> with work progress for both clients, tested with both schedulers separately.)
>
> This was basically the killer argument why we implemented FIFO in the first place. RR completely sucked on fairness when you have many clients submitting many small jobs.
>
> Looks like that the deadline scheduler is even better than FIFO in that regard, but I would also add a test with (for example) 100 clients doing submissions at the same time.
I can try that. So 100 clients with very deep submission queues? How
deep? Fully async? Or some synchronicity and what kind?
>> 2. Same two clients but one is now low priority.
>>
>> https://people.igalia.com/tursulin/drm-sched-qddl/normal-low.png
>>
>> Normal priority client is a solid line, low priority dotted. We can see how FIFO
>> completely starves the low priority client until the normal priority is fully
>> done. Only then the low priority client gets any GPU time.
>>
>> In constrast, qddl allows some GPU time to the low priority client.
>>
>> 3. Same clients but now high versus normal priority.
>>
>> Similar behaviour as in the previous one with normal a bit less de-prioritised
>> relative to high, than low was against normal.
>>
>> https://people.igalia.com/tursulin/drm-sched-qddl/high-normal.png
>>
>> 4. Heavy load vs interactive client.
>>
>> Heavy client emits a 75% GPU load in the format of 3x 2.5ms jobs followed by a
>> 2.5ms wait.
>>
>> Interactive client emites a 10% GPU load in the format of 1x 1ms job followed
>> by a 9ms wait.
>>
>> This simulates an interactive graphical client used on top of a relatively heavy
>> background load but no GPU oversubscription.
>>
>> Graphs show the interactive client only and from now on, instead of looking at
>> the client's queue depth, we look at its "fps".
>>
>> https://people.igalia.com/tursulin/drm-sched-qddl/heavy-interactive.png
>>
>> We can see that qddl allows a slighty higher fps for the interactive client
>> which is good.
>
> The most interesting question for this is what is the maximum frame time?
>
> E.g. how long needs the user to wait for a response from the interactive client at maximum?
I did a quick measure of those metrics, for this workload only.
Measured time from submit of the first job in the group (so frame), to
time last job in a group finished, and then subtracted the expected jobs
duration to get just the wait plus overheads latency.
Five averaged runs:
min avg max [ms]
FIFO 2.5 13.14 18.3
qddl 3.2 9.9 16.6
So it is a bit better in max, more so in max latencies. Question is how
representative is this synthetic workload of the real world.
Regards,
Tvrtko
>> 5. Low priority GPU hog versus heavy-interactive.
>>
>> Low priority client: 3x 2.5ms jobs client followed by a 0.5ms wait.
>> Interactive client: 1x 0.5ms job followed by a 10ms wait.
>>
>> https://people.igalia.com/tursulin/drm-sched-qddl/lowhog-interactive.png
>>
>> No difference between the schedulers.
>>
>> 6. Last set of test scenarios will have three subgroups.
>>
>> In all cases we have two interactive (synchronous, single job at a time) clients
>> with a 50% "duty cycle" GPU time usage.
>>
>> Client 1: 1.5ms job + 1.5ms wait (aka short bursty)
>> Client 2: 2.5ms job + 2.5ms wait (aka long bursty)
>>
>> a) Both normal priority.
>>
>> https://people.igalia.com/tursulin/drm-sched-qddl/5050-short.png
>> https://people.igalia.com/tursulin/drm-sched-qddl/5050-long.png
>>
>> Both schedulers favour the higher frequency duty cycle with qddl giving it a
>> little bit more which should be good for interactivity.
>>
>> b) Normal vs low priority.
>>
>> https://people.igalia.com/tursulin/drm-sched-qddl/5050-normal-low-normal.png
>> https://people.igalia.com/tursulin/drm-sched-qddl/5050-normal-low-low.png
>>
>> Qddl gives a bit more to the normal than low.
>>
>> c) High vs normal priority.
>>
>> https://people.igalia.com/tursulin/drm-sched-qddl/5050-high-normal-high.png
>> https://people.igalia.com/tursulin/drm-sched-qddl/5050-high-normal-normal.png
>>
>> Again, qddl gives a bit more share to the higher priority client.
>>
>> On the overall qddl looks like a potential improvement in terms of fairness,
>> especially avoiding priority starvation. There do not appear to be any
>> regressions with the tested workloads.
>>
>> As before, I am looking for feedback, ideas for what kind of submission
>> scenarios to test. Testers on different GPUs would be very welcome too.
>>
>> And I should probably test round-robin at some point, to see if we are maybe
>> okay to drop unconditionally, it or further work improving qddl would be needed.
>>
>> v2:
>> * Fixed many rebase errors.
>> * Added some new patches.
>> * Dropped single shot dependecy handling.
>>
>> v3:
>> * Added scheduling quality unit tests.
>> * Refined a tiny bit by adding some fairness.
>> * Dropped a few patches for now.
>>
>> Cc: Christian König <christian.koenig at amd.com>
>> Cc: Danilo Krummrich <dakr at redhat.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Cc: Philipp Stanner <pstanner at redhat.com>
>> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer at amd.com>
>> Cc: Michel Dänzer <michel.daenzer at mailbox.org>
>>
>> Tvrtko Ursulin (14):
>> drm/sched: Add some scheduling quality unit tests
>> drm/sched: Avoid double re-lock on the job free path
>> drm/sched: Consolidate drm_sched_job_timedout
>> drm/sched: Clarify locked section in drm_sched_rq_select_entity_fifo
>> drm/sched: Consolidate drm_sched_rq_select_entity_rr
>> drm/sched: Implement RR via FIFO
>> drm/sched: Consolidate entity run queue management
>> drm/sched: Move run queue related code into a separate file
>> drm/sched: Add deadline policy
>> drm/sched: Remove FIFO and RR and simplify to a single run queue
>> drm/sched: Queue all free credits in one worker invocation
>> drm/sched: Embed run queue singleton into the scheduler
>> drm/sched: De-clutter drm_sched_init
>> drm/sched: Scale deadlines depending on queue depth
>>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 27 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 5 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 8 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 8 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 8 +-
>> drivers/gpu/drm/scheduler/Makefile | 2 +-
>> drivers/gpu/drm/scheduler/sched_entity.c | 121 ++--
>> drivers/gpu/drm/scheduler/sched_fence.c | 2 +-
>> drivers/gpu/drm/scheduler/sched_internal.h | 17 +-
>> drivers/gpu/drm/scheduler/sched_main.c | 581 ++++--------------
>> drivers/gpu/drm/scheduler/sched_rq.c | 188 ++++++
>> drivers/gpu/drm/scheduler/tests/Makefile | 3 +-
>> .../gpu/drm/scheduler/tests/tests_scheduler.c | 548 +++++++++++++++++
>> include/drm/gpu_scheduler.h | 17 +-
>> 15 files changed, 962 insertions(+), 579 deletions(-)
>> create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c
>> create mode 100644 drivers/gpu/drm/scheduler/tests/tests_scheduler.c
>>
>
More information about the amd-gfx
mailing list