[RFC 00/14] Deadline scheduler and other ideas

Philipp Stanner phasta at mailbox.org
Fri Jan 17 12:12:17 UTC 2025


On Mon, 2024-12-30 at 16:52 +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> 
> <tldr>
> Replacing FIFO with a flavour of deadline driven scheduling and
> removing round-
> robin. Connecting the scheduler with dma-fence deadlines. First draft
> and
> testing by different drivers and feedback would be nice. I was only
> able to test
> it with amdgpu. Other drivers may not even compile.
> </tldr>
> 
> If I remember correctly Christian mentioned recently (give or take)
> that maybe
> round-robin could be removed. That got me thinking how and what could
> be
> improved and simplified. So I played a bit in the scheduler code and
> came up
> with something which appears to not crash at least. Whether or not
> there are
> significant advantages apart from maybe code consolidation and
> reduction is the
> main thing to be determined.
> 
> One big question is whether round-robin can really be removed. Does
> anyone use
> it, rely on it, or what are even use cases where it is much better
> than FIFO.
> 
> See "drm/sched: Add deadline policy" commit message for a short
> description on
> what flavour of deadline scheduling it is. But in essence it should a
> more fair
> FIFO where higher priority can not forever starve lower priorities.
> 
> "drm/sched: Connect with dma-fence deadlines" wires up dma-fence
> deadlines to
> the scheduler because it is easy and makes logical sense with this.
> And I
> noticed userspace already uses it so why not wire it up fully.
> 
> Otherwise the series is a bit of progression from consolidating RR
> into FIFO
> code paths and going from there to deadline and then to a change in
> how
> dependencies are handled. And code simplification to 1:1 run queue to
> scheduler
> relationship, because deadline does not need per priority run queues.
> 
> There is quite a bit of code to go throught here so I think it could
> be even
> better if other drivers could give it a spin as is and see if some
> improvements
> can be detected. Or at least no regressions.

Soooo – I have thought about this series a bit more and also read a bit
about the issues Michel recently mentioned.

As Danilo also pointed out, going for an experiment like that at the
current time is not a good idea. Not with the scheduler being in that
shape still and not without having powerful tools for regression
testing.

That said, we are slowly moving into the right direction. I think one
of the things we're lacking is good testing infrastructure. In fact,
it's on my list for a while now to write kunit tests for the scheduler
(beginning with the basics, submit a number of jobs and all that), so
that we get a better mechanism for detecting regressions.

Once we have more infrastructure for systematic testing, we could
consequently also slowly become more open to looking into more daring
changes.

I unfortunately so far couldn't manage to free up some time to dedicate
to that effort. In case you, Tvrtko, should have capacity for that, I
of course wouldn't mind at all; that could help greatly


Regards,
Philipp


> 
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Danilo Krummrich <dakr at redhat.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Philipp Stanner <pstanner at redhat.com>
> 
> Tvrtko Ursulin (14):
>   drm/sched: Delete unused update_job_credits
>   drm/sched: Remove idle entity from tree
>   drm/sched: Implement RR via FIFO
>   drm/sched: Consolidate entity run queue management
>   drm/sched: Move run queue related code into a separate file
>   drm/sched: Ignore own fence earlier
>   drm/sched: Resolve same scheduler dependencies earlier
>   drm/sched: Add deadline policy
>   drm/sched: Remove FIFO and RR and simplify to a single run queue
>   drm/sched: Queue all free credits in one worker invocation
>   drm/sched: Connect with dma-fence deadlines
>   drm/sched: Embed run queue singleton into the scheduler
>   dma-fence: Add helper for custom fence context when merging fences
>   drm/sched: Resolve all job dependencies in one go
> 
>  drivers/dma-buf/dma-fence-unwrap.c          |   8 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      |   6 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c     |  27 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h     |   5 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h   |   8 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |   8 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c     |   8 +-
>  drivers/gpu/drm/scheduler/Makefile          |   2 +-
>  drivers/gpu/drm/scheduler/sched_entity.c    | 316 ++++++-----
>  drivers/gpu/drm/scheduler/sched_fence.c     |   5 +-
>  drivers/gpu/drm/scheduler/sched_main.c      | 587 +++++-------------
> --
>  drivers/gpu/drm/scheduler/sched_rq.c        | 199 +++++++
>  include/drm/gpu_scheduler.h                 |  74 ++-
>  include/linux/dma-fence-unwrap.h            |  31 +-
>  14 files changed, 606 insertions(+), 678 deletions(-)
>  create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c
> 



More information about the dri-devel mailing list