[systemd-devel] The Linux Way or Some ideas to make systemd better

Sergey sergemdev at gmail.com
Sat Jul 16 20:32:24 PDT 2011


Hello.

I was thinking about it for some time, and got a few ideas I wanted to
share. Sorry for the long email, I'm just trying to explain in details
what the ideas are about, what benefit they bring and why I've chosen
that way, i.e. why these ideas cannot be done using existing tools.

If you don't want to read the entire email, just scroll down to words
"systemd packages".


"What's the problem?"
=====================
There's such thing as Linux. I often tell people how great it is - it's so
flexible, you can always modify it the way you want.

But what makes it great? Programs? Features? Price? No, it's Unix philosophy
behind it. The best part of it is:
    "Write programs that do one thing and do it well"
What's so good in it?

The matter is that there's no ideal program. Of course every program has
bugs, but that's not the main reason. The main reason is that people don't
like when program does not do what they expect AND different people expect
different things. This is where the philosophy hits.

On the windows-like systems, when something does not work like you want,
it's still a single highly integrated system, you can't do much about it.
But in Linux, where the system is a number of separated components, you
can replace any part of it and everything else will still work. You don't
like KDE - there's GNOME. You like KDE apps, but don't like KDE WM - easy,
replace it with compiz and use the rest of KDE apps as usual. You need some
light WM to save memory - IceWM is for you, all the browsers and office
programs will still work there. You're setting up a server and don't want
Xorg to eat your resources - no problem, just remove it. If you need, you
can replace every part of the system, DM (KDM, GDM, XDM, SLIM), glibc
(uclibc, eglibc), init-daemon (sysvinit, initng, upstart), even kernel...
It's so cool, not many systems in the world give you such a great flexibility.
All this works because in Linux all programs "do one thing and do it well".

And there comes systemd, which aims to do every thing in the world and a
few more. The question one can think when he sees it - is it really a
linux program?


"Bla-bla-bla, what do you suggest?"
===================================
You can split it up! Make it the Unix way. Instead of a large and highly
integrated bunch of programs make it a number of small separated components
with weak dependencies between each other.

That would give many benefits to the project. For developers it will make
the structure a lot cleaner, easy to understand, easy to extend and support.
Developers usually don't like bloated software that nobody-knows-how-it-works.
For users it will give the ability to make it look like they want it to be.
Any program is perfect for user if it does exactly what he wants. But it's
impossible to write the program that suits everybody out of the box, so in
the end the most flexible programs win.

Trying to add too many features into a single program ("Let's write a builtin
tetris game, that shows up when user presses C-A-D! Why? Why not?!") you may
end up with none of them implemented good enough. It's rather pointless to
make the change just to change something. I.e. you can fight with bash
scripts, but what for? For speed? Bash can spawn ~700 scripts per second and
can parse about 5 megabytes of code during the second. How much can you
possibly win for 20 scripts a few kilobytes each? Instead of guessing what
features can be implemented let's look at real tasks that should be solved,
and implement each task in a separate application.

What are those tasks? What exactly are those problems that should be fixed
with new init system, that could not be fixed before? If the main goal -
a faster boot process (I'm actually following a video presentation of
systemd here), then let's look at the usual boot process:
  [udev] -> [filesystems] -> [iptables] -> [network] -> [syslog] ->
[avahi] -> ...
and see what we can do here.

First task - start services in parallel. The key idea is to create all the
sockets and then start all the services... well, you already know that. :)
Let's remember this task.

Second task - check and mount filesystems in parallel with starting services.
Let's remember this task too.

[do I miss anything else related to faster boot?]

So, to get the first task done we need a single daemon, let's call it
`sysunixd`, that does one simple thing - creates a lot of unix sockets and
then starts a lot of applications. That's all. :)
Can we do it using existing applications, like `inetd`? Well, not really. The
key difference between `inetd` and `sysunixd` is that every inets service is
either started on-demand (when someone accesses the socket) or is not started
at all. But services in `sysunixd` should be either started always or not
started at all. "On-demand" startup in `sysunixd` may be a nice option, but
not a requirement, because we want to make boot faster, while on-demand start
makes it slower.
Another difference between `sysunixd` and `inetd` is that `sysunixd` must be
runlevel-aware. Users usually configure different services to run in different
runlevels. I.e. you may not need `gpm` in init5 while running X-server, but
it's a useful tool to copy-paste configs for that X-server in init3. So
`sysunixd` should know current runlevel to know what services it must start.

Apart from daemons using unix sockets, there're daemons using INET sockets
(apache, ftp, smtp, etc), and we apply the same idea to them - there must be a
daemon, let's call it `sysinetd`, which is similar to `sysunixd`, but for INET
sockets. It cannot be merged with `sysunixd` because there's a fundamental
difference between UNIX sockets and INET sockets - INET sockets can be
accessed from outside, so they in no way should be created before [iptables].
If you create them before [iptables] - you'll get hacked. :)

What can we do for the second task? We can't do much about root filesystem -
no services can be run before root filesystem is checked. But others can be
checked while some services are starting. So there should be something, let's
call it `sysmountd`, that does one thing - checks all filesystems that must
be checked, and mount all filesystems that must be mounted (well, technically
that's two things). It checks root filesystem, then puts automount points to
all to-be-mounted filesystems and goes into background (notifying this way
that other services can be started) and checks other filesystems in background.
The `sysmountd` is not really a daemon, because when all filesystems are
checked and mounted it exits, to free resources, anyway it has nothing else
to do. It could be a bash script, if bash script could set automount points.

A general boot process becomes like this:
  [udev] -> [sysmountd] -> [sysunixd] -> [iptables] -> [sysinetd] -> ...
But there must still be someone, starting them all. So there must be some
daemon, let's call it `systemd`, a regular sysv-compatible init-daemon,
that does one thing - starts programs listed in /etc/rc.d/rc#.d/ for the
current runlevel.
Can this be done using existing programs (sysvinit/upstart)? Well, yes, but
it won't be that good (see below about compatibility and usability).

Why to separate these `sysunixd` and `sysinetd`, why not just make them a
single daemon? Because that will make things much more complicated, and won't
give any additional benefits. Even being a part of a single daemon they all
would still be started one by one. You can't start mounting filesystems before
you start udev (you can do it before udev settles, but not before it starts).
You can't start `sysunixd` before you check root filesystem. You cannot start
[sysinetd] before [iptables] either.

Technically it's possible to start [iptables] and [sysinetd] one after another
in a single script from [sysunixd], but it's not a good idea, because it adds
a strict dependency (impossible to remove iptables) for no good reason. And it
won't work on debian, where there's no iptables service.

Well, that's all. Implementing these daemons solves these tasks. Nothing else
_required_. Why don't we start some daemon when new device appear? Because
that's what udev already does. And we can't do it any better. Why can't we...?
We can, but what for? Every new development means new errors. But if it has
no benefits over existing solutions - it's just a waste of time and making
users angry because of new bugs.

There're two more things that must be considered when writing these daemons -
Compatibility and Usability. If mysql author wants to use [sysinetd] for
starting his service he can put his service file, i.e. to /etc/sysinetd.d/.
But mysql must still work on systems without [sysinetd], so its author will
still put init.d script there. Since we have two scripts `sysinetd` should
somehow tell `systemd` that init.d script must not get started. And that's
not the only thing that must be "told" to systemd init daemon. There should
be the way to tell him to switch runlevel or request current runlevel. So
there should be a kind of communication. The easiest one is to use UNIX
sockets. Why not use filesystem sockets? Because filesystem may be readonly,
or may need to be remounted. Why not reuse some existing models, i.e. dbus?
Because we're talking about init daemon, not about regular userspace
application, dbus would add additional complexity, additional dependency and
additional point of failure, giving no benefits. I.e. when system is started
in a single mode there should be no DBUS running, but runlevel switching
must still work. Also dbus-based solution needs hacks and workarounds to
keep dbus alive.

Similar communications would be between `sysunixd` and `systemd` to "shadow"
those services from `systemd`, that're started by `sysunixd`. To reduce
code duplication common communication code can be put into a library. Let's
call it `libsystemd`. That library would have some simple interface, like
list(service|ALL), start(), stop(), status(), disable(permanently|temporarily),
("temporarily" disable would be used to shadow sysv-service) enable(-"-).
Same library can contain a selection method, to specify, which daemon should
be controlled - a method connect(DEFAULT|systemd|sysunixd|sysinetd). Server
side, a listening socket, can actually be a part of the same library.

On the other hand new init system would be easy to use. After all, programs
are created not just to write something, but to use them. So it must be
simple and easy to use. For example in good old days to find out what
service starts on each runlevel admin needed just one command:
------- chkconfig output -------
$ chkconfig --list
acpid           0:off   1:off   2:on    3:on    4:on    5:on    6:off
atd             0:off   1:off   2:off   3:on    4:on    5:on    6:off
avahi-daemon    0:off   1:off   2:off   3:on    4:on    5:on    6:off
crond           0:off   1:off   2:on    3:on    4:on    5:on    6:off
exim            0:off   1:off   2:on    3:on    4:on    5:on    6:off
haldaemon       0:off   1:off   2:off   3:on    4:on    5:on    6:off
ip6tables       0:off   1:off   2:on    3:on    4:on    5:on    6:off
iptables        0:off   1:off   2:on    3:on    4:on    5:on    6:off
irqbalance      0:off   1:off   2:on    3:on    4:on    5:on    6:off
mdmonitor       0:off   1:off   2:on    3:on    4:on    5:on    6:off
messagebus      0:off   1:off   2:off   3:on    4:on    5:on    6:off
microcode_ctl   0:off   1:off   2:on    3:on    4:on    5:on    6:off
mysqld          0:off   1:off   2:off   3:off   4:off   5:off   6:off
netfs           0:off   1:off   2:off   3:on    4:on    5:on    6:off
network         0:off   1:off   2:on    3:on    4:on    5:on    6:off
ntpd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
sshd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
rsyslog         0:off   1:off   2:on    3:on    4:on    5:on    6:off
udev-post       0:off   1:off   2:off   3:on    4:on    5:on    6:off
vsftpd          0:off   1:off   2:on    3:on    4:on    5:on    6:off
xinetd          0:off   1:off   2:off   3:on    4:on    5:on    6:off

xinetd based services:
        chargen-dgram:  off
        chargen-stream: off
        cvs:            off
        daytime-dgram:  off
        daytime-stream: off
        discard-dgram:  off
        discard-stream: off
        echo-dgram:     off
        echo-stream:    off
        rsync:          off
        tcpmux-server:  off
        time-dgram:     off
        time-stream:    off
---- end of chkconfig output ---
(that's a real chkconfig output, it used to support xinetd services)
And looking at those services one could tell for sure what services are
started and what are not. Without looking through a bunch of recursive
dependencies, without running a complex script it was obvious that
mysqld won't be started. Even if some other service tries to use mysqld
it still won't be started.

New chkconfig output can look like this:
------- chkconfig output -------
$ chkconfig --list
ip6tables       0:off   1:off   2:on    3:on    4:on    5:on    6:off
iptables        0:off   1:off   2:on    3:on    4:on    5:on    6:off
microcode_ctl   0:off   1:off   2:on    3:on    4:on    5:on    6:off
network         0:off   1:off   2:on    3:on    4:on    5:on    6:off
sysmountd       0:off   1:on    2:on    3:on    4:on    5:on    6:off
sysinetd        0:off   1:on    2:on    3:on    4:on    5:on    6:off
sysunixd        0:off   1:on    2:on    3:on    4:on    5:on    6:off
udev            0:off   1:on    2:on    3:on    4:on    5:on    6:off
udev-post       0:off   1:off   2:off   3:on    4:on    5:on    6:off

[sysunixd]
acpid           0:off   1:off   2:on    3:on    4:on    5:on    6:off
atd             0:off   1:off   2:off   3:on    4:on    5:on    6:off
avahi-daemon    0:off   1:off   2:off   3:on    4:on    5:on    6:off
crond           0:off   1:off   2:on    3:on    4:on    5:on    6:off
haldaemon       0:off   1:off   2:off   3:on    4:on    5:on    6:off
irqbalance      0:off   1:off   2:on    3:on    4:on    5:on    6:off
mdmonitor       0:off   1:off   2:on    3:on    4:on    5:on    6:off
messagebus      0:off   1:off   2:off   3:on    4:on    5:on    6:off
rsyslog         0:off   1:off   2:on    3:on    4:on    5:on    6:off

[sysinetd]
exim            0:off   1:off   2:on    3:on    4:on    5:on    6:off
mysqld          0:off   1:off   2:off   3:off   4:off   5:off   6:off
netfs           0:off   1:off   2:off   3:on    4:on    5:on    6:off
ntpd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
sshd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
vsftpd          0:off   1:off   2:on    3:on    4:on    5:on    6:off
xinetd          0:off   1:off   2:off   3:on    4:on    5:on    6:off

xinetd based services:
        chargen-dgram:  off
        chargen-stream: off
        cvs:            off
        daytime-dgram:  off
        daytime-stream: off
        discard-dgram:  off
        discard-stream: off
        echo-dgram:     off
        echo-stream:    off
        rsync:          off
        tcpmux-server:  off
        time-dgram:     off
        time-stream:    off
---- end of chkconfig output ---
That way user/admin won't even notice that something has changed, since
everything looks same. To get such compatibility `chkconfig` should be
aware of systemd, sysunixd, sysinetd... Same `libsystemd` can be used
to do that. To make it backward compatible libsystemd should be written
with fallback support - i.e. if a start(mysql) command was called, and it
could not contact `systemd` it should fallback to /etc/init.d/mysql start.

Note: the structure is plain, without any complex nested relations. Recursive
dependencies are not easy to display and hard to follow. They give no benefits
in speed of starting services, but makes it much harder to track whether some
service is going to starts or not. When admin says that mysql must be stopped
it must be stopped, even if phpmyadmin from inside httpd tries to use it. And
when admin says `service XXX restart` only XXX service must be restarted,
nothing else. The `service` utility may print a warning about other services
using same socket, but must not enforce any dependency (i.e. restarting
network or syslog should not restart entire system).

There's one more thing systemd is currently doing - cgroups. I'm not sure what
they're for, since they do not solve any task and do not increase boot speed,
but if they're really needed - they can be supported as a runner-plugin to
`libsystemd`. Every systemd-daemon uses `libsystemd` to run services, so there
can be a common function, that is used to start a service. By default it would
be a simple exec, but having plugin support. Why not just make it embedded
in a systemd code? Because that's an additional unnecessary dependency, that
may break things. I.e. someone may want to build a kernel without cgroups.
Or may want to use these daemons on a system that has no cgroups support
(BSD/Solaris?). Or maybe someone wants to have cgroups used by a different
application (ulatencyd).

There's another benefit you can get from plugin-based runner - you can get an
easily extendable running system. I.e. if you want to put every service in
some specific environment you just need to install a runner plugin.

Such structure is also easy to extend, i.e. if you want to expose these
features over dbus for some configuration utilities (system-config-services)
you just need to write a dbus-interface to `libsystemd`.

So the entire project can be split into the following systemd packages:
* libsystemd
    content: /lib/libsystemd.so /lib/systemd-plugins/
    configs: no
    description: library containing client and server parts for
systemd communication protocol
* systemd (requires systemd-libs)
    content: /sbin/systemd
    configs: /etc/inittab /etc/rc.d/
    description: sysv-compatible init daemon, nothing else
* sysunixd (requires systemd-libs)
    content: /sbin/sysunixd
    configs: /etc/sysunixd.d/
    description: inetd-like daemon, starts daemons using UNIX sockets
* sysinetd (requires systemd-libs)
    content: /sbin/sysinetd
    configs: /etc/sysinetd.d/
    description: inetd-like daemon, starts daemons using INET sockets
* sysmountd
    content: /sbin/sysmountd
    configs: /etc/fstab /etc/crypttab /etc/fstab.d/ maybe
    description: checks root filesystem in foreground and others in background
    It does not share code with systemd and can be a totally separate package.
* systemd-utils (requires systemd-libs)
    content: /sbin/chkconfig /sbin/service /sbin/runlevel
    configs: no
    descrpion: sysv-compatible utilities for systemd daemons
* systemd-cgroups (requires systemd-libs)
    content: /lib/systemd-plugins/libsystemd-runner-cgroups.so
    configs: no
    description: runner plugin for libsystemd
* systemd-dbus (requires systemd-libs)
    content: /etc/dbus-1/system.d/systemd.conf and the service itself
    configs: no
    description: dbus service to libsystemd


"Why is this better?"
=====================
Because it's flexible, portable, simple, easy to support and it's unix-way.

Such structure would work under any circumstances on almost any configuration.
Users of other Linux distributions can install `sysunixd`/`sysinetd` and it
will work out of box, together with any existing init system.

If someone, having a tightly integrated bunch of native upstart scripts, cannot
switch entire init system, he can still install `sysunixd` and `sysinetd` and
get some speed improvements. Even if he cannot use `sysunixd` and `sysinetd`
for starting services he can still benefit from faster disks mounting using
`sysmountd`.

It's extremely portable. Users of other operating systems can install almost
every component (except systemd-cgroups). Such structure can be ported even
to the systems that don't support UNIX sockets (just disable `sysunixd`
and patch `libsystemd` to use different communication method).

It does not have any extra dependencies. So if someone needs to build a compact
system for netbook with 128MB RAM he can install just `systemd` and save a few
MB of RAM without dbus (Xorg+IceWM don't need dbus) and other systemd services.
There's also no need in dbus on ssh/dns/http-servers.

Even a single `systemd-utils` package, without all other systemd daemons would
still work for users, allowing to put every service into a special running
environment using runner-plugins.

And weak dependencies between services makes debugging easier. When something
breaks it's easy to find who was that and fix the bug.


"Is this all?"
==============
No, there're other ways to improve the structure.

For example, this structure forces developers (or maintainers) of every service
to write two startup scripts - one for systemd-based script and one sysv-init
script for non-systemd-based systems. Why not use a single script for that?
For example instead of writing second startup script we just add the line:
  # TCPListen: 22
to the LSB Header of /etc/init.d/sshd. That will make transition easy and
remains fully backward compatible. Services that don't have this line will
be treated as usual sysv services.

That also makes developers' work easier - they won't have to write two
startup scripts and worry about compatibility.

But let's leave the details alone. This email is already long enough. :)


The End
=======
I'm writing these ideas here because I'm not experienced enough to implement
them myself. So I just hope that people in this list, that are much more
experienced, will like them. Thanks for reading.

If you have any questions - please, ask them.

-- 
Inspired by http://en.wikipedia.org/wiki/Unix_philosophy
  Sergey


More information about the systemd-devel mailing list