[RFC v7 00/23] DRM scheduling cgroup controller
Tvrtko Ursulin
tvrtko.ursulin at igalia.com
Thu May 8 07:44:55 UTC 2025
On 08/05/2025 07:29, Matthew Brost wrote:
> On Fri, May 02, 2025 at 01:32:33PM +0100, Tvrtko Ursulin wrote:
>> Hi all,
>>
>> This is another respin of this old work^1 but this version is a total rewrite
>> and completely changes how the control is done.
>>
>> This time round the work builds upon the "fair" DRM scheduler work I have posted
>> recently^2. I am including those patches for completeness and because there were
>> some tweaks there.
>>
>> -> It also means people only interested into the cgroup portion probably only
>> need to look at the last seven patches.
>>
>> And of those seven the last one is an example how a DRM scheduler based DRM
>> driver can be wired up with the cgroup controller. So it is quite simple.
>>
>> To illustrate the runtime effects I ran the Unigine Heaven benchmark in
>> parallel with the deferredmultisampling Vulkan demo, each in its own cgroup.
>> First the scheduling weights were the default 100 and 100 respectively, and we
>> look at the GPU utilisation:
>>
>> https://people.igalia.com/tursulin/drmcgroup-100-100.png
>>
>> It is about equal or therabout since it oscillates at runtime as the benchmark
>> scenes change.
>>
>> Then we change drm.weight of the deferredmultisampling cgroup to 1:
>>
>> https://people.igalia.com/tursulin/drmcgroup-100-1.png
>>
>> There we see around 75:25 in favour of Unigine Heaven. (Although it also
>> oscillates as explained above).
>>
>> Important to note is that with GPUs the control is still not nowhere as precise
>> and accurate as with the CPU controller and that the fair scheduler is work in
>> progress. But it works and looks useful.
>>
>> Going into the implementation, in this version it is much simpler than before
>> since the mechanism of time budgets and over-budget singalling is completely
>> gone and replaced with notifying clients directly about their assigned relative
>> scheduling weights.
>>
>> This connects really nicely with the fair DRM scheduler RFC since we can simply
>> mix in the scheduling weight with the existing scheduling entity priority based
>> runtime to vruntime scaling factors.
>>
>> It also means there is much less code in the controller itself.
>>
>> Another advantage is that it is really easy to wire up individual drivers which
>> use the DRM scheduler in the hardware scheduling mode (ie. not 1:1 firmware
>> scheduling).
>>
>
> Admittedly, I just scanned the series—so it might be easier for you to
> elaborate on the above point.
>
> With hardware scheduling mode, the DRM scheduler is essentially just a
> dependency tracker that hands off scheduling to the hardware. Are you
> suggesting that this series doesn't affect that mode, or does it have
> some impact on hardware scheduling (e.g., holding back jobs with
> resolved dependencies in the KMD)?
No effect on 1:1 drivers.
(Ignoring some perhaps minor effects from "drm/sched: Queue all free
credits in one worker invocation", or micro effects from removing the
run-queues.)
> Follow-up question: aren't most modern drivers and hardware trending
> toward hardware scheduling mode? If so, what is the motivation for
> making such large changes?
If you are asking for the "fair" scheduler itself, that is covered in
the cover letter for the respective series (benchmark data included).
Goal is to simplify the code base and make it schedule at least as good
if not better for all the drivers which are not 1:1.
If you are asking about the cgroup controller (this combined series),
the motivation is in the previous cover letter linked from this one.
There is currently no way to externally control DRM scheduling
priorities and there are use cases to enable it. For example run
something computational in the background and have it compete less for
the GPU with the foreground tasks. Or wire with the window manager
focused/unfocused window handling to automatically prioritise foreground
tasks via cgroups.
The latter concept was also discussed in the scope of the dmem cgroup
controller for providing some degree of eviction "protection" to the
foreground task. So it all fits nicely into those sort of usage models.
1:1 drivers can still hook into this (as they were able throughout the
life of this RFC). How exactly it would be up to individual firmwares.
This RFC would notify the driver "this client has this relative
scheduling weight" and from there it's up to the driver. Ie. those
drivers wouldn't use the drm_sched_cgroup_notify_weight() helper when
registering with DRM cgroup controller (which this series provides), but
would have to come up with their own.
Regards,
Tvrtko
>> On the userspace interface side of things it is the same as before. We have
>> drm.weight as an interface, taking integers from 1 to 10000, the same as CPU and
>> IO cgroup controllers.
>>
>> About the use cases, it is the same as before. With this we would be able to run
>> a workload in the background and make it compete less with the foreground load.
>> Be it explicitly or when integrating with Desktop Environments some of which
>> already have cgroup support for tracking foreground vs background windows or
>> similar.
>>
>> I would be really interested if people would attempt to try this out, either
>> directly the amdgpu support as provided in the series, or by wiring up other
>> drivers.
>>
>> P.S.
>> About the CC list. It's a large series so I will put most people on Cc only in
>> the cover letter as a ping of a sort. Whoever is interested can for now find the
>> series in the archives.
>>
>> 1)
>> https://lore.kernel.org/dri-devel/20231024160727.282960-1-tvrtko.ursulin@linux.intel.com/
>>
>> 2)
>> https://lore.kernel.org/dri-devel/20250425102034.85133-1-tvrtko.ursulin@igalia.com/
>>
>> Cc: Christian König <christian.koenig at amd.com>
>> Cc: Danilo Krummrich <dakr at kernel.org>
>> CC: Leo Liu <Leo.Liu at amd.com>
>> Cc: Maíra Canal <mcanal at igalia.com>
>> Cc: Matthew Brost <matthew.brost at intel.com>
>> Cc: Michal Koutný <mkoutny at suse.com>
>> Cc: Michel Dänzer <michel.daenzer at mailbox.org>
>> Cc: Philipp Stanner <phasta at kernel.org>
>> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer at amd.com>
>> Cc: Rob Clark <robdclark at gmail.com>
>> Cc: Tejun Heo <tj at kernel.org>
>>
>> Tvrtko Ursulin (23):
>> drm/sched: Add some scheduling quality unit tests
>> drm/sched: Add some more scheduling quality unit tests
>> drm/sched: De-clutter drm_sched_init
>> drm/sched: Avoid double re-lock on the job free path
>> drm/sched: Consolidate drm_sched_job_timedout
>> drm/sched: Consolidate drm_sched_rq_select_entity_rr
>> drm/sched: Implement RR via FIFO
>> drm/sched: Consolidate entity run queue management
>> drm/sched: Move run queue related code into a separate file
>> drm/sched: Free all finished jobs at once
>> drm/sched: Account entity GPU time
>> drm/sched: Remove idle entity from tree
>> drm/sched: Add fair scheduling policy
>> drm/sched: Remove FIFO and RR and simplify to a single run queue
>> drm/sched: Queue all free credits in one worker invocation
>> drm/sched: Embed run queue singleton into the scheduler
>> cgroup: Add the DRM cgroup controller
>> cgroup/drm: Track DRM clients per cgroup
>> cgroup/drm: Add scheduling weight callback
>> cgroup/drm: Introduce weight based scheduling control
>> drm/sched: Add helper for tracking entities per client
>> drm/sched: Add helper for DRM cgroup controller weight notifications
>> drm/amdgpu: Register with the DRM scheduling cgroup controller
>>
>> Documentation/admin-guide/cgroup-v2.rst | 22 +
>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 13 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 1 +
>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +
>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 27 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 5 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 8 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 8 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 8 +-
>> drivers/gpu/drm/drm_file.c | 11 +
>> drivers/gpu/drm/scheduler/Makefile | 2 +-
>> drivers/gpu/drm/scheduler/sched_entity.c | 158 ++--
>> drivers/gpu/drm/scheduler/sched_fence.c | 2 +-
>> drivers/gpu/drm/scheduler/sched_internal.h | 126 ++-
>> drivers/gpu/drm/scheduler/sched_main.c | 570 +++---------
>> drivers/gpu/drm/scheduler/sched_rq.c | 214 +++++
>> drivers/gpu/drm/scheduler/tests/Makefile | 3 +-
>> .../gpu/drm/scheduler/tests/tests_scheduler.c | 815 ++++++++++++++++++
>> include/drm/drm_drv.h | 26 +
>> include/drm/drm_file.h | 11 +
>> include/drm/gpu_scheduler.h | 68 +-
>> include/linux/cgroup_drm.h | 29 +
>> include/linux/cgroup_subsys.h | 4 +
>> init/Kconfig | 5 +
>> kernel/cgroup/Makefile | 1 +
>> kernel/cgroup/drm.c | 446 ++++++++++
>> 27 files changed, 2024 insertions(+), 574 deletions(-)
>> create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c
>> create mode 100644 drivers/gpu/drm/scheduler/tests/tests_scheduler.c
>> create mode 100644 include/linux/cgroup_drm.h
>> create mode 100644 kernel/cgroup/drm.c
>>
>> --
>> 2.48.0
>>
More information about the amd-gfx
mailing list