[RFC 00/14] Deadline scheduler and other ideas

Mon Jan 13 12:59:03 UTC 2025

On 10/01/2025 17:28, Matthew Brost wrote:
> On Fri, Jan 10, 2025 at 09:16:44AM +0000, Tvrtko Ursulin wrote:
>>
>> On 09/01/2025 19:59, Matthew Brost wrote:
>>> On Wed, Jan 08, 2025 at 06:55:16PM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 08/01/2025 16:57, Danilo Krummrich wrote:
>>>>> On Wed, Jan 08, 2025 at 03:13:39PM +0000, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 08/01/2025 08:31, Danilo Krummrich wrote:
>>>>>>> On Mon, Dec 30, 2024 at 04:52:45PM +0000, Tvrtko Ursulin wrote:
>>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
>>>>>>>
>>>>>>> "Deadline scheduler and other ideas"
>>>>>>>
>>>>>>> There's a few patches that could be sent outside the scope of this series, e.g.
>>>>>>> the first one.
>>>>>>>
>>>>>>> I think it would make sense to do so.
>>>>>>
>>>>>> For now I'll keep them at the head of this RFC and as they get acked or
>>>>>> r-b-ed I can easily send them standalone or re-ordered. Until then having
>>>>>> the series separate would make the RFC not standalone.
>>>>>>
>>>>>>>> <tldr>
>>>>>>>> Replacing FIFO with a flavour of deadline driven scheduling and removing round-
>>>>>>>> robin. Connecting the scheduler with dma-fence deadlines. First draft and
>>>>>>>> testing by different drivers and feedback would be nice. I was only able to test
>>>>>>>> it with amdgpu. Other drivers may not even compile.
>>>>>>>
>>>>>>> What are the results from your tests with amdgpu? Do you have some measurements?
>>>>>>
>>>>>> We already covered this in the thread with Philipp to a degree. Tl;dr; the
>>>>>> main idea is whether we simplify the code and at least not regress.
>>>>>>
>>>>>> I don't expect improvements on the amdgpu side with the workloads like games
>>>>>> and benchmarks. I did not measure anything significant apart that priorities
>>>>>> seem to work with the run queues removed.
>>>>>
>>>>> I appreaciate the effort, and generally I like the idea, but I also must admit
>>>>> that this isn't the most convincing motiviation for such an integral change
>>>>> (especially the "at least not regress" part).
>>>>
>>>> It is challenging yes. But for completeness the full context of what you
>>>> quoted (if you also read my replies to Philipp) was *if* we can shrink the
>>>> code base, add some fairness to FIFO, *and* not regress then those three
>>>> added together would IMHO not be bad. We shouldn't be scared to touch it
>>>> because only touching it you can truly understand the gotchas which any
>>>> amount of kerneldoc will not help with.
>>>>> I'd still like to encourage you to send the small cleanups separately, get them
>>>>> in soon and leave the deadline scheduler as a separate RFC.
>>>>>
>>>>> Meanwhile, Philipp is working on getting documentation straight and digging into
>>>>> all the FIXMEs of the scheduler getting to a cleaner baseline. And with your
>>>>> cleanups you're already helping with that.
>>>>>
>>>>> For now, I'd prefer to leave the deadline scheduler stuff for when things are a
>>>>> bit more settled and / or drivers declare the need.
>>>>
>>>> I just sent v2:
>>>>
>>>> About motivation for the documenting efforts:
>>>>
>>>>    13 files changed, 424 insertions(+), 576 deletions(-)
>>>>
>>>> Fewer lines to document. ;)
>>>>
>>>> On a serious note, I ordered the series (mostly*) so you can read it in
>>>> order and for patches/ideas you like please say and I can extract and send
>>>> separately if you want. I am reluctant to extract things beforehand, before
>>>> knowing which ones people will like and so far there is only one with acks.
>>>>
>>>> *)
>>>> Mostly because perhaps "drm/sched: Queue all free credits in one worker
>>>> invocation" could be interesting to move before the most.
>>>>
>>>
>>> I looked into this. When I originally changed the scheduler from a
>>> kthread to a worker, I designed it the way your patch implements it:
>>> looping in the worker until credits run out or no jobs are available.
>>>
>>> If I recall correctly, the feedback from Christian (or Luben?) was to
>>> rely on the work queue's requeuing mechanism to submit more than one
>>> job. From a latency perspective, there might be a small benefit, but
>>> it's more likely that if you queue two jobs back-to-back, even when
>>> relying on the work queue's rescheduling, the first job will still be
>>> running on the hardware, nullifying any potential latency improvement.
>>>
>>>   From a fairness perspective, multiplexing across multiple work queues
>>> one job at a time makes a bit more sense, in my opinion.
>>
>> You mean multiplexing across multiple _entities_? Because work queue is only
> 
> No, I mean if you have multiple schedulers (work queues) with jobs that
> are to run dequeuing a job a time per scheduler would let the core work
> queue scheduling give a level of fairness.

That doesn't work in the current implementation because entity->rq (ie. 
picked scheduler) cannot change when jobs are queued.

>> one. That it unchanged with my patch. Ie. it is not changing to pick jobs
>> from a single entity but still picks a job at a time from the top entity.
>> And top entity can change as jobs are popped. What remains is the question
>> of why burn CPU cycles and do it in a roundabout way if it is very easy to
>> do it directly and at the same time avoid that unconditional final wakeup
>> when queues are empty.
> 
> Like I said, I had this way initially but the feedback I recieved was to
> dequeue 1 job at time and kick the work queue to reschedule itself.
> Unless everyone opinion has changed, I don't think this is change we
> should make.
> 
> Here are the some references...
> 
> [1] https://patchwork.freedesktop.org/patch/530652/?series=116055&rev=1
> [2] https://patchwork.freedesktop.org/patch/575874/?series=129143&rev=1
> [3] https://patchwork.freedesktop.org/patch/576334/?series=129286&rev=1

Thanks. Aside that the referenced discussion were also about fixing some 
regressions, on the topic of the high level design I spotted two arguments.

One was about the supposed need to "interleave" one-at-a-time run work 
and the same for free work. I don't see why would that be the case and 
besides it is already not universally true when unordered workqueues are 
used.

Second was to bail soone(-er) is scheduler is paused. That's a good 
point and I added it to my local version.

In summary it feels this area should be clarified further.

Regards,

Tvrtko