[RFC 00/14] Deadline scheduler and other ideas
Philipp Stanner
pstanner at redhat.com
Fri Jan 3 13:45:14 UTC 2025
On Fri, 2025-01-03 at 13:31 +0100, Christian König wrote:
> Am 03.01.25 um 13:02 schrieb Tvrtko Ursulin:
> > > > </tldr>
> > > >
> > > > If I remember correctly Christian mentioned recently (give or
> > > > take)
> > > > that maybe
> > > > round-robin could be removed. That got me thinking how and what
> > > > could
> > > > be
> > > > improved and simplified. So I played a bit in the scheduler
> > > > code and
> > > > came up
> > > > with something which appears to not crash at least. Whether or
> > > > not
> > > > there are
> > > > significant advantages apart from maybe code consolidation and
> > > > reduction is the
> > > > main thing to be determined.
> > >
> > > Hi,
> > >
> > > so first of all: Happy new year and thx for putting in all that
> > > effort
> > > :)
> > >
> > > I gave the series a first look; Since this is an RFC, let's
> > > abstain
> > > from doing deeper reviews of the exact code for now.
> >
> > Ditto and thank you for taking a look!
>
> Ah, yes. Happy new year guys :)
>
> >
> > > > One big question is whether round-robin can really be removed.
> > > > Does
> > > > anyone use
> > > > it, rely on it, or what are even use cases where it is much
> > > > better
> > > > than FIFO.
> > >
> > > So AFAICS Round Robin is not used anymore by anyone. And my
> > > understanding indeed is, too, that there is not really any use-
> > > case
> > > where one would like anything except for FIFO.
> > >
> > > Looking at 977d97f18b5b ("drm/scheduler: Set the FIFO scheduling
> > > policy
> > > as the default"), it seems to me that RR just was easy to
> > > implement and
> > > it had the disadvantage of systems under high load cause the
> > > oldest job
> > > to be starved to death, which is why FIFO was introduced.
> > >
> > > So my guess would be that RR just is a relict.
> > >
> > > If we agree on that, then we could remove RR in any case, and the
> > > subsequent question would be whether FIFO should be replaced with
> > > deadline (or: if there should be FIFO *and* deadline?), wouldn't
> > > it?
> >
> > I am unsure about RR but I agree what is the second part of the
> > question.
>
> Well we came up with FIFO because we found that RR performed quite
> badly
> when you have a huge number of submitting applications.
>
> E.g. one of our cloud test cases ran 100 instances of a single game
> and
> the worst response time improved massively by switching from RR to
> FIFO.
>
> Different priorities on the other hand were originally invented to
> make
> sure the kernel has precedence over userspace.
Should it, in your opinion, be *guaranteed* that the kernel always
takes precedence, or is a "kernel takes precedence almost always"
enough?
Because this patchset does seem to do the latter
> But later we also exposed
> the priorities to userspace which results in the problem that higher
> priority queues can starve low priority ones.
>
> That's the other reason why I said that RR should probably be removed
> and FIFO changed in a way that the priority is basically just a bonus
> to
> the score used for sorting the FIFO. I haven't taken a deeper look
> yet,
> but I think that this is more or less what this patch set here does.
>
> What FIFO is still missing compared to RR is some sort of fairness
> between queues.
Before I forget it I note it right away:
If we go for such a thing we should set our terminology straight. A
FIFO that suddenly takes deadlines into account is not a FIFO anymore.
So we shouldn't call it that.
However, a deadline scheduler with priority levels also doesn't seem to
be a deadline scheduler.
I think, no matter what we do, we should aim for strict congruency
between the GPU scheduler and the CPU scheduler in regards with terms
and naming. The CPU scheduler has SCHED_FIFO, which is pure FIFO with
priorities (behaving exactly like the GPU schedulers FIFO, currently)
and SCHED_DEADLINE, which is purely based on deadlines (not like in
this patch set).
So we should find a new suitable term if we go for that to avoid any
misunderstandings.
P.
> E.g. a queues which hasn't submitted something in a
> while might get a bonus for their submissions compared to a queue
> which
> submits stuff all the time (or something like that).
>
> Regards,
> Christian.
>
>
> > There may be nuances with different drivers depending on how much
> > they
> > can queue to the hardware/firmware at once. Modern drivers which
> > use
> > 1:1 sched:entity I don't expect care about DRM scheduler scheduling
> > mode. The fewer jobs driver can queue to the backend the more it
> > cares. Question is FIFO ever better. Keeping in mind that for same
> > priority this deadline and FIFO are actually identical.
> >
> > > > See "drm/sched: Add deadline policy" commit message for a short
> > > > description on
> > > > what flavour of deadline scheduling it is. But in essence it
> > > > should a
> > > > more fair
> > > > FIFO where higher priority can not forever starve lower
> > > > priorities.
> > >
> > > See my answer on that patch.
> > >
> > > As you can imagine I'm wondering if that "better FIFO" would be
> > > worth
> > > it considering that we are running into a certain risk of
> > > regressing
> > > stuff through this rework.
> >
> > I will reply to that part there then.
> > > > "drm/sched: Connect with dma-fence deadlines" wires up dma-
> > > > fence
> > > > deadlines to
> > > > the scheduler because it is easy and makes logical sense with
> > > > this.
> > > > And I
> > > > noticed userspace already uses it so why not wire it up fully.
> > >
> > > Userspace uses the dma-fence deadlines you mean? Do you have a
> > > pointer
> > > for us?
> >
> > I've noticed it empirically and the one I could fine is this:
> >
> > https://invent.kde.org/plasma/kwin/-/commit/4ad5670ddfcd7400c8b84c12cbf8bd97a0590f43
> >
> >
> >
> > > > Otherwise the series is a bit of progression from consolidating
> > > > RR
> > > > into FIFO
> > > > code paths and going from there to deadline and then to a
> > > > change in
> > > > how
> > > > dependencies are handled. And code simplification to 1:1 run
> > > > queue to
> > > > scheduler
> > > > relationship, because deadline does not need per priority run
> > > > queues.
> > > >
> > > > There is quite a bit of code to go throught here so I think it
> > > > could
> > > > be even
> > > > better if other drivers could give it a spin as is and see if
> > > > some
> > > > improvements
> > > > can be detected. Or at least no regressions.
> > >
> > > I hope I can dive deeper into the Nouveau side soon.
> >
> > Fantastic!
> > > >
> > > > Cc: Christian König <christian.koenig at amd.com>
> > > > Cc: Danilo Krummrich <dakr at redhat.com>
> > > > Cc: Matthew Brost <matthew.brost at intel.com>
> > > > Cc: Philipp Stanner <pstanner at redhat.com>
> > > >
> > > > Tvrtko Ursulin (14):
> > > > drm/sched: Delete unused update_job_credits
> > > > drm/sched: Remove idle entity from tree
> > > > drm/sched: Implement RR via FIFO
> > > > drm/sched: Consolidate entity run queue management
> > > > drm/sched: Move run queue related code into a separate file
> > > > drm/sched: Ignore own fence earlier
> > > > drm/sched: Resolve same scheduler dependencies earlier
> > > > drm/sched: Add deadline policy
> > > > drm/sched: Remove FIFO and RR and simplify to a single run
> > > > queue
> > > > drm/sched: Queue all free credits in one worker invocation
> > > > drm/sched: Connect with dma-fence deadlines
> > > > drm/sched: Embed run queue singleton into the scheduler
> > > > dma-fence: Add helper for custom fence context when merging
> > > > fences
> > > > drm/sched: Resolve all job dependencies in one go
> > >
> > > It seems to me that this series is a "port RR and FIFO to
> > > deadline"-
> > > series with some additional patches that could be branched out
> > > and be
> > > reviewed through a separate series?
> > >
> > > Patch 1 ("Delete unused...") for example. Other candidates are
> > > Patch 5
> > > ("Move run queue related..."), 13 ("Add helper for...").
> > >
> > > A few patches might be mergeable even if the main idea with
> > > deadline
> > > doesn't get approved, that's why I'm suggesting that.
> >
> > Yes some of those could be possible and I am happy to extract and
> > rebase in principle. But not yet I think. If and when something
> > gets a
> > positive nod.
> >
> > Regards,
> >
> > Tvrtko
> >
> > > >
> > > > drivers/dma-buf/dma-fence-unwrap.c | 8 +-
> > > > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +-
> > > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 27 +-
> > > > drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 5 +-
> > > > drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 8 +-
> > > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 8 +-
> > > > drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 8 +-
> > > > drivers/gpu/drm/scheduler/Makefile | 2 +-
> > > > drivers/gpu/drm/scheduler/sched_entity.c | 316 ++++++-----
> > > > drivers/gpu/drm/scheduler/sched_fence.c | 5 +-
> > > > drivers/gpu/drm/scheduler/sched_main.c | 587 +++++------
> > > > -------
> > > > --
> > > > drivers/gpu/drm/scheduler/sched_rq.c | 199 +++++++
> > > > include/drm/gpu_scheduler.h | 74 ++-
> > > > include/linux/dma-fence-unwrap.h | 31 +-
> > > > 14 files changed, 606 insertions(+), 678 deletions(-)
> > > > create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c
> > > >
> > >
>
More information about the dri-devel
mailing list