[RFC v7 00/23] DRM scheduling cgroup controller

Matthew Brost matthew.brost at intel.com
Thu May 8 06:29:25 UTC 2025


On Fri, May 02, 2025 at 01:32:33PM +0100, Tvrtko Ursulin wrote:
> Hi all,
> 
> This is another respin of this old work^1 but this version is a total rewrite
> and completely changes how the control is done.
> 
> This time round the work builds upon the "fair" DRM scheduler work I have posted
> recently^2. I am including those patches for completeness and because there were
> some tweaks there.
> 
> -> It also means people only interested into the cgroup portion probably only
>    need to look at the last seven patches.
> 
>    And of those seven the last one is an example how a DRM scheduler based DRM
>    driver can be wired up with the cgroup controller. So it is quite simple.
> 
> To illustrate the runtime effects I ran the Unigine Heaven benchmark in
> parallel with the deferredmultisampling Vulkan demo, each in its own cgroup.
> First the scheduling weights were the default 100 and 100 respectively, and we
> look at the GPU utilisation:
> 
>   https://people.igalia.com/tursulin/drmcgroup-100-100.png
> 
> It is about equal or therabout since it oscillates at runtime as the benchmark
> scenes change.
> 
> Then we change drm.weight of the deferredmultisampling cgroup to 1:
> 
>   https://people.igalia.com/tursulin/drmcgroup-100-1.png
> 
> There we see around 75:25 in favour of Unigine Heaven. (Although it also
> oscillates as explained above).
> 
> Important to note is that with GPUs the control is still not nowhere as precise
> and accurate as with the CPU controller and that the fair scheduler is work in
> progress. But it works and looks useful.
> 
> Going into the implementation, in this version it is much simpler than before
> since the mechanism of time budgets and over-budget singalling is completely
> gone and replaced with notifying clients directly about their assigned relative
> scheduling weights.
> 
> This connects really nicely with the fair DRM scheduler RFC since we can simply
> mix in the scheduling weight with the existing scheduling entity priority based
> runtime to vruntime scaling factors.
> 
> It also means there is much less code in the controller itself.
> 
> Another advantage is that it is really easy to wire up individual drivers which
> use the DRM scheduler in the hardware scheduling mode (ie. not 1:1 firmware
> scheduling).
> 

Admittedly, I just scanned the series—so it might be easier for you to
elaborate on the above point.

With hardware scheduling mode, the DRM scheduler is essentially just a
dependency tracker that hands off scheduling to the hardware. Are you
suggesting that this series doesn't affect that mode, or does it have
some impact on hardware scheduling (e.g., holding back jobs with
resolved dependencies in the KMD)?

Follow-up question: aren't most modern drivers and hardware trending
toward hardware scheduling mode? If so, what is the motivation for
making such large changes?

Matt

> On the userspace interface side of things it is the same as before. We have
> drm.weight as an interface, taking integers from 1 to 10000, the same as CPU and
> IO cgroup controllers.
> 
> About the use cases, it is the same as before. With this we would be able to run
> a workload in the background and make it compete less with the foreground load.
> Be it explicitly or when integrating with Desktop Environments some of which
> already have cgroup support for tracking foreground vs background windows or
> similar.
> 
> I would be really interested if people would attempt to try this out, either
> directly the amdgpu support as provided in the series, or by wiring up other
> drivers.
> 
> P.S.
> About the CC list. It's a large series so I will put most people on Cc only in
> the cover letter as a ping of a sort. Whoever is interested can for now find the
> series in the archives.
> 
> 1)
> https://lore.kernel.org/dri-devel/20231024160727.282960-1-tvrtko.ursulin@linux.intel.com/
> 
> 2)
> https://lore.kernel.org/dri-devel/20250425102034.85133-1-tvrtko.ursulin@igalia.com/
> 
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Danilo Krummrich <dakr at kernel.org>
> CC: Leo Liu <Leo.Liu at amd.com>
> Cc: Maíra Canal <mcanal at igalia.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Michal Koutný <mkoutny at suse.com>
> Cc: Michel Dänzer <michel.daenzer at mailbox.org>
> Cc: Philipp Stanner <phasta at kernel.org>
> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer at amd.com>
> Cc: Rob Clark <robdclark at gmail.com>
> Cc: Tejun Heo <tj at kernel.org>
> 
> Tvrtko Ursulin (23):
>   drm/sched: Add some scheduling quality unit tests
>   drm/sched: Add some more scheduling quality unit tests
>   drm/sched: De-clutter drm_sched_init
>   drm/sched: Avoid double re-lock on the job free path
>   drm/sched: Consolidate drm_sched_job_timedout
>   drm/sched: Consolidate drm_sched_rq_select_entity_rr
>   drm/sched: Implement RR via FIFO
>   drm/sched: Consolidate entity run queue management
>   drm/sched: Move run queue related code into a separate file
>   drm/sched: Free all finished jobs at once
>   drm/sched: Account entity GPU time
>   drm/sched: Remove idle entity from tree
>   drm/sched: Add fair scheduling policy
>   drm/sched: Remove FIFO and RR and simplify to a single run queue
>   drm/sched: Queue all free credits in one worker invocation
>   drm/sched: Embed run queue singleton into the scheduler
>   cgroup: Add the DRM cgroup controller
>   cgroup/drm: Track DRM clients per cgroup
>   cgroup/drm: Add scheduling weight callback
>   cgroup/drm: Introduce weight based scheduling control
>   drm/sched: Add helper for tracking entities per client
>   drm/sched: Add helper for DRM cgroup controller weight notifications
>   drm/amdgpu: Register with the DRM scheduling cgroup controller
> 
>  Documentation/admin-guide/cgroup-v2.rst       |  22 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |   6 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       |  13 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h       |   1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   9 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       |  27 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h       |   5 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h     |   8 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c   |   8 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c       |   8 +-
>  drivers/gpu/drm/drm_file.c                    |  11 +
>  drivers/gpu/drm/scheduler/Makefile            |   2 +-
>  drivers/gpu/drm/scheduler/sched_entity.c      | 158 ++--
>  drivers/gpu/drm/scheduler/sched_fence.c       |   2 +-
>  drivers/gpu/drm/scheduler/sched_internal.h    | 126 ++-
>  drivers/gpu/drm/scheduler/sched_main.c        | 570 +++---------
>  drivers/gpu/drm/scheduler/sched_rq.c          | 214 +++++
>  drivers/gpu/drm/scheduler/tests/Makefile      |   3 +-
>  .../gpu/drm/scheduler/tests/tests_scheduler.c | 815 ++++++++++++++++++
>  include/drm/drm_drv.h                         |  26 +
>  include/drm/drm_file.h                        |  11 +
>  include/drm/gpu_scheduler.h                   |  68 +-
>  include/linux/cgroup_drm.h                    |  29 +
>  include/linux/cgroup_subsys.h                 |   4 +
>  init/Kconfig                                  |   5 +
>  kernel/cgroup/Makefile                        |   1 +
>  kernel/cgroup/drm.c                           | 446 ++++++++++
>  27 files changed, 2024 insertions(+), 574 deletions(-)
>  create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c
>  create mode 100644 drivers/gpu/drm/scheduler/tests/tests_scheduler.c
>  create mode 100644 include/linux/cgroup_drm.h
>  create mode 100644 kernel/cgroup/drm.c
> 
> -- 
> 2.48.0
> 


More information about the amd-gfx mailing list