[systemd-devel] Realtime scheduling with CONFIG_RT_GROUP_SCHED=y

Lennart Poettering lennart at poettering.net
Mon Jul 10 08:26:21 UTC 2017


On Thu, 06.07.17 10:56, Lars Kellogg-Stedman (lars at redhat.com) wrote:

> I'm running on a kernel with CONFIG_RT_GROUP_SCHED=y.  I understand that
> this is counter to the recommendation in the README ("We recommend to turn
> off Real-Time group scheduling in the kernel when using systemd...."), but
> I don't have control over the kernel configuration.
> 
> On this system, it appears that starting "docker" (docker-ce-17.06.0.ce-1)
> results in the creation of new cpu cgroups that for some reason apply to
> systemd services.  That is, after starting docker,
> /sys/fs/cgroup/cpu/system.slice exists when previously it didn't.
> 
> Once this happens, a service that attempts to set realtime scheduling
> (SCHED_RR) via sched_setscheduler() will fail, presumably because the
> cgroup has no realtime budget in cpu.rt_runtime_us.
> 
> In older versions of systemd one could handle this using the directives
> described in
> https://www.freedesktop.org/wiki/Software/systemd/MyServiceCantGetRealtime/,
> but unfortunately that document, despite being the number 1 search result
> for pretty much anything involving "systemd" and "realtime", is obsolete
> and those directives no longer exist.
> 
> Is there a way to make this work correctly with modern versions of
> systemd?  I've hacked around it for now by creating
> /etc/systemd/system/myservice.service.d/realtime.conf that moves the
> service back to the root cgroup and then uses chrt to set the scheduling
> policy:
> 
>   [Service]
>   ExecStartPost=/bin/cgclassify -g cpu:/ $MAINPID
>   ExecStartPost=/bin/chrt -r -p 99 $MAINPID
> 
> ...and while that works, it seems really ugly.  I've attempted to set
> CPUSchedulingPolicy=rr in the unit, but that simply results in systemd
> failing to start the service and logging "Failed at step SETSCHEDULER
> spawning...".
> 
> Is there a better way of addressing this?

Hmm, by default, systemd should not be adding anything to the "cpu"
hierarchy, unless at least one service sets CPUShare=, CPUAccounting= or
related, or system-wide DefaultCPUAccounting= is set. There's
currently no nice tool unfortunately to track down why a cgroup was
created though...

Generally, RT group scheduling is not usable unless you explicitly
assign an RT budget to each cgroup that wants to have RT, and you
manually make sure you never hand out more RT budget than
possible. Because that's really nasty and no good defaults can be
picked for this mode we don't support it.

If you ignore this and try to make it work locally YMMV. What you
could do is drop in ExecStartPre= lines into the relevant services
that echo an RT budget into the relevant cgroup files in the "cpu"
hierarchy, possibly propagating these to the parent cgroups. To figure
out the right cgroup path to echo this into you'd have to query
/proc/self/cgroup...

Yeah, it's nasty, but at the moment a more automatic, and friendlier
exposure of the RT budget logic is not planned, as the kernel APIs are
just impossible to use with automatic management...

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list