[RFC 00/14] Deadline scheduler and other ideas

Fri Jan 10 09:16:44 UTC 2025

On 09/01/2025 19:59, Matthew Brost wrote:
> On Wed, Jan 08, 2025 at 06:55:16PM +0000, Tvrtko Ursulin wrote:
>>
>> On 08/01/2025 16:57, Danilo Krummrich wrote:
>>> On Wed, Jan 08, 2025 at 03:13:39PM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 08/01/2025 08:31, Danilo Krummrich wrote:
>>>>> On Mon, Dec 30, 2024 at 04:52:45PM +0000, Tvrtko Ursulin wrote:
>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>>>>
>>>>> "Deadline scheduler and other ideas"
>>>>>
>>>>> There's a few patches that could be sent outside the scope of this series, e.g.
>>>>> the first one.
>>>>>
>>>>> I think it would make sense to do so.
>>>>
>>>> For now I'll keep them at the head of this RFC and as they get acked or
>>>> r-b-ed I can easily send them standalone or re-ordered. Until then having
>>>> the series separate would make the RFC not standalone.
>>>>
>>>>>> <tldr>
>>>>>> Replacing FIFO with a flavour of deadline driven scheduling and removing round-
>>>>>> robin. Connecting the scheduler with dma-fence deadlines. First draft and
>>>>>> testing by different drivers and feedback would be nice. I was only able to test
>>>>>> it with amdgpu. Other drivers may not even compile.
>>>>>
>>>>> What are the results from your tests with amdgpu? Do you have some measurements?
>>>>
>>>> We already covered this in the thread with Philipp to a degree. Tl;dr; the
>>>> main idea is whether we simplify the code and at least not regress.
>>>>
>>>> I don't expect improvements on the amdgpu side with the workloads like games
>>>> and benchmarks. I did not measure anything significant apart that priorities
>>>> seem to work with the run queues removed.
>>>
>>> I appreaciate the effort, and generally I like the idea, but I also must admit
>>> that this isn't the most convincing motiviation for such an integral change
>>> (especially the "at least not regress" part).
>>
>> It is challenging yes. But for completeness the full context of what you
>> quoted (if you also read my replies to Philipp) was *if* we can shrink the
>> code base, add some fairness to FIFO, *and* not regress then those three
>> added together would IMHO not be bad. We shouldn't be scared to touch it
>> because only touching it you can truly understand the gotchas which any
>> amount of kerneldoc will not help with.
>>> I'd still like to encourage you to send the small cleanups separately, get them
>>> in soon and leave the deadline scheduler as a separate RFC.
>>>
>>> Meanwhile, Philipp is working on getting documentation straight and digging into
>>> all the FIXMEs of the scheduler getting to a cleaner baseline. And with your
>>> cleanups you're already helping with that.
>>>
>>> For now, I'd prefer to leave the deadline scheduler stuff for when things are a
>>> bit more settled and / or drivers declare the need.
>>
>> I just sent v2:
>>
>> About motivation for the documenting efforts:
>>
>>   13 files changed, 424 insertions(+), 576 deletions(-)
>>
>> Fewer lines to document. ;)
>>
>> On a serious note, I ordered the series (mostly*) so you can read it in
>> order and for patches/ideas you like please say and I can extract and send
>> separately if you want. I am reluctant to extract things beforehand, before
>> knowing which ones people will like and so far there is only one with acks.
>>
>> *)
>> Mostly because perhaps "drm/sched: Queue all free credits in one worker
>> invocation" could be interesting to move before the most.
>>
> 
> I looked into this. When I originally changed the scheduler from a
> kthread to a worker, I designed it the way your patch implements it:
> looping in the worker until credits run out or no jobs are available.
> 
> If I recall correctly, the feedback from Christian (or Luben?) was to
> rely on the work queue's requeuing mechanism to submit more than one
> job. From a latency perspective, there might be a small benefit, but
> it's more likely that if you queue two jobs back-to-back, even when
> relying on the work queue's rescheduling, the first job will still be
> running on the hardware, nullifying any potential latency improvement.
> 
>  From a fairness perspective, multiplexing across multiple work queues
> one job at a time makes a bit more sense, in my opinion.

You mean multiplexing across multiple _entities_? Because work queue is 
only one. That it unchanged with my patch. Ie. it is not changing to 
pick jobs from a single entity but still picks a job at a time from the 
top entity. And top entity can change as jobs are popped. What remains 
is the question of why burn CPU cycles and do it in a roundabout way if 
it is very easy to do it directly and at the same time avoid that 
unconditional final wakeup when queues are empty.

Regards,

Tvrtko

>>>> Where something could show is if someone is aware of a workload where normal
>>>> prio starves low. Since one part of the idea is that with the "deadline"
>>>> scheme those should work a little bit more balanced.
>>>>
>>>> Also again, feedback (including testing feedback from other drivers) would
>>>> be great, and ideas of which workloads to test.
>>>
>>> Unfortunately, there's not much I can tell from the Nouveau side of things,
>>> since we're using the firmware scheduler scheme (one entity per scheduler) and
>>> hence the scheduling strategy isn't that relevant.
>>
>> Yeah. Hopefully someone with more appropriate hardware gets intrigued to try
>> it out, or to suggest interesting workloads.
>>
>> Until then I happy to keep it alive in the background and as said you can
>> pick and choose the parts you like.
>>
>> Regards,
>>
>> Tvrtko
>>
>>>
>>>>
>>>> Btw I will send a respin in a day or so which will clean up some things and
>>>> adds some more tiny bits.
>>>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>