[systemd-devel] issues with large number of units, systemd 204 and 208 [d10k]

Lennart Poettering lennart at poettering.net
Mon Oct 28 12:32:35 PDT 2013


On Thu, 17.10.13 14:09, Joe Miller (joeym at joeym.net) wrote:

> Next - I upgrade to systemd-208, same 16,000 units. systemd completes the
> re-exec but sits at 100% cpu forever (or at least hours, I give up
> waiting). All attempts to access systemd will timeout. strace of systemd
> shows that it is calling open() on every object in the /sys/fs/cgroup tree.
> Rebooting the server results in a box that is not able to complete a boot.
> Console shows the server sitting very early in the boot process at the
> "Welcome to f19" message. I suspect systemd is spinning at 100% and cannot
> move forward.

Any chance you can get a stack trace about this?

> Finally - On a hunch, based on the data observed through strace, I modify
> my sample units and create a new .slice for each set of units so that not
> all 16,000 units are placed under the default system.slice. Thus I end up
> with one additional unit for each set, ie: 4000 * (test-X.slice,
> test-X.mount, test-X.automount, test-nginx-X.socket, test-nginx-X.service),
> and each of the units in a set is assigned to the relevant test-X.slice.
> This works with systemd-204 which ignores the unknown Slice= settings, then
> I upgrading to systemd-208 which goes smoother, and a reboot of the server
> is successful. `daemon-reload` is fast now:  3s -vs- 15s. However,
> start/stop/restart of a service takes 25-30seconds. Strace of systemd shows
> a lot of open() activity across the cgroup tree.
> 
> 
> I am wondering if something significant changed between 204 and 208 with
> regards to handling of cgroups? Restarting a service in 204, strace shows
> only a handful of open() calls to cgroup nodes that are relevant to the
> service being restarted, but in 208 it appears that systemd may be scanning
> the entire cgroup tree..

So yeah, we have to propagate cgroup attributes between units in the
same slice. We currently check our neighbors in O(n) complexity. This
can be improved to O(1) by caching the stuff we want to know in the
parent slice. THis appears to be an easy fix.

I added this to the TODO list for now.

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list