[systemd-devel] The Linux Way or Some ideas to make systemd better

微菜 microcai at fedoraproject.org
Sat Jul 16 22:50:50 PDT 2011


Sounds like a UNIX way, but really no.


On 2011年07月17日 11:32, Sergey wrote:
> Hello.
> 
> I was thinking about it for some time, and got a few ideas I wanted to
> share. Sorry for the long email, I'm just trying to explain in details
> what the ideas are about, what benefit they bring and why I've chosen
> that way, i.e. why these ideas cannot be done using existing tools.
> 
> If you don't want to read the entire email, just scroll down to words
> "systemd packages".
> 
> 
> "What's the problem?"
> =====================
> There's such thing as Linux. I often tell people how great it is - it's so
> flexible, you can always modify it the way you want.
> 
> But what makes it great? Programs? Features? Price? No, it's Unix philosophy
> behind it. The best part of it is:
>     "Write programs that do one thing and do it well"
> What's so good in it?
> 
> The matter is that there's no ideal program. Of course every program has
> bugs, but that's not the main reason. The main reason is that people don't
> like when program does not do what they expect AND different people expect
> different things. This is where the philosophy hits.
> 
> On the windows-like systems, when something does not work like you want,
> it's still a single highly integrated system, you can't do much about it.
> But in Linux, where the system is a number of separated components, you
> can replace any part of it and everything else will still work. You don't
> like KDE - there's GNOME. You like KDE apps, but don't like KDE WM - easy,
> replace it with compiz and use the rest of KDE apps as usual. You need some
> light WM to save memory - IceWM is for you, all the browsers and office
> programs will still work there. You're setting up a server and don't want
> Xorg to eat your resources - no problem, just remove it. If you need, you
> can replace every part of the system, DM (KDM, GDM, XDM, SLIM), glibc
> (uclibc, eglibc), init-daemon (sysvinit, initng, upstart), even kernel...
> It's so cool, not many systems in the world give you such a great flexibility.
> All this works because in Linux all programs "do one thing and do it well".
> 
> And there comes systemd, which aims to do every thing in the world and a
> few more. The question one can think when he sees it - is it really a
> linux program?
> 
> 
> "Bla-bla-bla, what do you suggest?"
> ===================================
> You can split it up! Make it the Unix way. Instead of a large and highly
> integrated bunch of programs make it a number of small separated components
> with weak dependencies between each other.
> 
> That would give many benefits to the project. For developers it will make
> the structure a lot cleaner, easy to understand, easy to extend and support.
> Developers usually don't like bloated software that nobody-knows-how-it-works.
> For users it will give the ability to make it look like they want it to be.
> Any program is perfect for user if it does exactly what he wants. But it's
> impossible to write the program that suits everybody out of the box, so in
> the end the most flexible programs win.
> 
> Trying to add too many features into a single program ("Let's write a builtin
> tetris game, that shows up when user presses C-A-D! Why? Why not?!") you may
> end up with none of them implemented good enough. It's rather pointless to
> make the change just to change something. I.e. you can fight with bash
> scripts, but what for? For speed? Bash can spawn ~700 scripts per second and
> can parse about 5 megabytes of code during the second. How much can you
> possibly win for 20 scripts a few kilobytes each? Instead of guessing what
> features can be implemented let's look at real tasks that should be solved,
> and implement each task in a separate application.
> 
> What are those tasks? What exactly are those problems that should be fixed
> with new init system, that could not be fixed before? If the main goal -
> a faster boot process (I'm actually following a video presentation of
> systemd here), then let's look at the usual boot process:
>   [udev] -> [filesystems] -> [iptables] -> [network] -> [syslog] ->
> [avahi] -> ...
> and see what we can do here.
> 
> First task - start services in parallel. The key idea is to create all the
> sockets and then start all the services... well, you already know that. :)
> Let's remember this task.
> 
> Second task - check and mount filesystems in parallel with starting services.
> Let's remember this task too.
> 
> [do I miss anything else related to faster boot?]
> 
> So, to get the first task done we need a single daemon, let's call it
> `sysunixd`, that does one simple thing - creates a lot of unix sockets and
> then starts a lot of applications. That's all. :)
> Can we do it using existing applications, like `inetd`? Well, not really. The
> key difference between `inetd` and `sysunixd` is that every inets service is
> either started on-demand (when someone accesses the socket) or is not started
> at all. But services in `sysunixd` should be either started always or not
> started at all. "On-demand" startup in `sysunixd` may be a nice option, but
> not a requirement, because we want to make boot faster, while on-demand start
> makes it slower.
> Another difference between `sysunixd` and `inetd` is that `sysunixd` must be
> runlevel-aware. Users usually configure different services to run in different
> runlevels. I.e. you may not need `gpm` in init5 while running X-server, but
> it's a useful tool to copy-paste configs for that X-server in init3. So
> `sysunixd` should know current runlevel to know what services it must start.
> 
> Apart from daemons using unix sockets, there're daemons using INET sockets
> (apache, ftp, smtp, etc), and we apply the same idea to them - there must be a
> daemon, let's call it `sysinetd`, which is similar to `sysunixd`, but for INET
> sockets. It cannot be merged with `sysunixd` because there's a fundamental
> difference between UNIX sockets and INET sockets - INET sockets can be
> accessed from outside, so they in no way should be created before [iptables].
> If you create them before [iptables] - you'll get hacked. :)
> 
> What can we do for the second task? We can't do much about root filesystem -
> no services can be run before root filesystem is checked. But others can be
> checked while some services are starting. So there should be something, let's
> call it `sysmountd`, that does one thing - checks all filesystems that must
> be checked, and mount all filesystems that must be mounted (well, technically
> that's two things). It checks root filesystem, then puts automount points to
> all to-be-mounted filesystems and goes into background (notifying this way
> that other services can be started) and checks other filesystems in background.
> The `sysmountd` is not really a daemon, because when all filesystems are
> checked and mounted it exits, to free resources, anyway it has nothing else
> to do. It could be a bash script, if bash script could set automount points.
> 
> A general boot process becomes like this:
>   [udev] -> [sysmountd] -> [sysunixd] -> [iptables] -> [sysinetd] -> ...
> But there must still be someone, starting them all. So there must be some
> daemon, let's call it `systemd`, a regular sysv-compatible init-daemon,
> that does one thing - starts programs listed in /etc/rc.d/rc#.d/ for the
> current runlevel.
> Can this be done using existing programs (sysvinit/upstart)? Well, yes, but
> it won't be that good (see below about compatibility and usability).
> 
> Why to separate these `sysunixd` and `sysinetd`, why not just make them a
> single daemon? Because that will make things much more complicated, and won't
> give any additional benefits. Even being a part of a single daemon they all
> would still be started one by one. You can't start mounting filesystems before
> you start udev (you can do it before udev settles, but not before it starts).
> You can't start `sysunixd` before you check root filesystem. You cannot start
> [sysinetd] before [iptables] either.
> 
> Technically it's possible to start [iptables] and [sysinetd] one after another
> in a single script from [sysunixd], but it's not a good idea, because it adds
> a strict dependency (impossible to remove iptables) for no good reason. And it
> won't work on debian, where there's no iptables service.
> 
> Well, that's all. Implementing these daemons solves these tasks. Nothing else
> _required_. Why don't we start some daemon when new device appear? Because
> that's what udev already does. And we can't do it any better. Why can't we...?
> We can, but what for? Every new development means new errors. But if it has
> no benefits over existing solutions - it's just a waste of time and making
> users angry because of new bugs.
> 
> There're two more things that must be considered when writing these daemons -
> Compatibility and Usability. If mysql author wants to use [sysinetd] for
> starting his service he can put his service file, i.e. to /etc/sysinetd.d/.
> But mysql must still work on systems without [sysinetd], so its author will
> still put init.d script there. Since we have two scripts `sysinetd` should
> somehow tell `systemd` that init.d script must not get started. And that's
> not the only thing that must be "told" to systemd init daemon. There should
> be the way to tell him to switch runlevel or request current runlevel. So
> there should be a kind of communication. The easiest one is to use UNIX
> sockets. Why not use filesystem sockets? Because filesystem may be readonly,
> or may need to be remounted. Why not reuse some existing models, i.e. dbus?
> Because we're talking about init daemon, not about regular userspace
> application, dbus would add additional complexity, additional dependency and
> additional point of failure, giving no benefits. I.e. when system is started
> in a single mode there should be no DBUS running, but runlevel switching
> must still work. Also dbus-based solution needs hacks and workarounds to
> keep dbus alive.
> 
> Similar communications would be between `sysunixd` and `systemd` to "shadow"
> those services from `systemd`, that're started by `sysunixd`. To reduce
> code duplication common communication code can be put into a library. Let's
> call it `libsystemd`. That library would have some simple interface, like
> list(service|ALL), start(), stop(), status(), disable(permanently|temporarily),
> ("temporarily" disable would be used to shadow sysv-service) enable(-"-).
> Same library can contain a selection method, to specify, which daemon should
> be controlled - a method connect(DEFAULT|systemd|sysunixd|sysinetd). Server
> side, a listening socket, can actually be a part of the same library.
> 
> On the other hand new init system would be easy to use. After all, programs
> are created not just to write something, but to use them. So it must be
> simple and easy to use. For example in good old days to find out what
> service starts on each runlevel admin needed just one command:
> ------- chkconfig output -------
> $ chkconfig --list
> acpid           0:off   1:off   2:on    3:on    4:on    5:on    6:off
> atd             0:off   1:off   2:off   3:on    4:on    5:on    6:off
> avahi-daemon    0:off   1:off   2:off   3:on    4:on    5:on    6:off
> crond           0:off   1:off   2:on    3:on    4:on    5:on    6:off
> exim            0:off   1:off   2:on    3:on    4:on    5:on    6:off
> haldaemon       0:off   1:off   2:off   3:on    4:on    5:on    6:off
> ip6tables       0:off   1:off   2:on    3:on    4:on    5:on    6:off
> iptables        0:off   1:off   2:on    3:on    4:on    5:on    6:off
> irqbalance      0:off   1:off   2:on    3:on    4:on    5:on    6:off
> mdmonitor       0:off   1:off   2:on    3:on    4:on    5:on    6:off
> messagebus      0:off   1:off   2:off   3:on    4:on    5:on    6:off
> microcode_ctl   0:off   1:off   2:on    3:on    4:on    5:on    6:off
> mysqld          0:off   1:off   2:off   3:off   4:off   5:off   6:off
> netfs           0:off   1:off   2:off   3:on    4:on    5:on    6:off
> network         0:off   1:off   2:on    3:on    4:on    5:on    6:off
> ntpd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
> sshd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
> rsyslog         0:off   1:off   2:on    3:on    4:on    5:on    6:off
> udev-post       0:off   1:off   2:off   3:on    4:on    5:on    6:off
> vsftpd          0:off   1:off   2:on    3:on    4:on    5:on    6:off
> xinetd          0:off   1:off   2:off   3:on    4:on    5:on    6:off
> 
> xinetd based services:
>         chargen-dgram:  off
>         chargen-stream: off
>         cvs:            off
>         daytime-dgram:  off
>         daytime-stream: off
>         discard-dgram:  off
>         discard-stream: off
>         echo-dgram:     off
>         echo-stream:    off
>         rsync:          off
>         tcpmux-server:  off
>         time-dgram:     off
>         time-stream:    off
> ---- end of chkconfig output ---
> (that's a real chkconfig output, it used to support xinetd services)
> And looking at those services one could tell for sure what services are
> started and what are not. Without looking through a bunch of recursive
> dependencies, without running a complex script it was obvious that
> mysqld won't be started. Even if some other service tries to use mysqld
> it still won't be started.
> 
> New chkconfig output can look like this:
> ------- chkconfig output -------
> $ chkconfig --list
> ip6tables       0:off   1:off   2:on    3:on    4:on    5:on    6:off
> iptables        0:off   1:off   2:on    3:on    4:on    5:on    6:off
> microcode_ctl   0:off   1:off   2:on    3:on    4:on    5:on    6:off
> network         0:off   1:off   2:on    3:on    4:on    5:on    6:off
> sysmountd       0:off   1:on    2:on    3:on    4:on    5:on    6:off
> sysinetd        0:off   1:on    2:on    3:on    4:on    5:on    6:off
> sysunixd        0:off   1:on    2:on    3:on    4:on    5:on    6:off
> udev            0:off   1:on    2:on    3:on    4:on    5:on    6:off
> udev-post       0:off   1:off   2:off   3:on    4:on    5:on    6:off
> 
> [sysunixd]
> acpid           0:off   1:off   2:on    3:on    4:on    5:on    6:off
> atd             0:off   1:off   2:off   3:on    4:on    5:on    6:off
> avahi-daemon    0:off   1:off   2:off   3:on    4:on    5:on    6:off
> crond           0:off   1:off   2:on    3:on    4:on    5:on    6:off
> haldaemon       0:off   1:off   2:off   3:on    4:on    5:on    6:off
> irqbalance      0:off   1:off   2:on    3:on    4:on    5:on    6:off
> mdmonitor       0:off   1:off   2:on    3:on    4:on    5:on    6:off
> messagebus      0:off   1:off   2:off   3:on    4:on    5:on    6:off
> rsyslog         0:off   1:off   2:on    3:on    4:on    5:on    6:off
> 
> [sysinetd]
> exim            0:off   1:off   2:on    3:on    4:on    5:on    6:off
> mysqld          0:off   1:off   2:off   3:off   4:off   5:off   6:off
> netfs           0:off   1:off   2:off   3:on    4:on    5:on    6:off
> ntpd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
> sshd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
> vsftpd          0:off   1:off   2:on    3:on    4:on    5:on    6:off
> xinetd          0:off   1:off   2:off   3:on    4:on    5:on    6:off
> 
> xinetd based services:
>         chargen-dgram:  off
>         chargen-stream: off
>         cvs:            off
>         daytime-dgram:  off
>         daytime-stream: off
>         discard-dgram:  off
>         discard-stream: off
>         echo-dgram:     off
>         echo-stream:    off
>         rsync:          off
>         tcpmux-server:  off
>         time-dgram:     off
>         time-stream:    off
> ---- end of chkconfig output ---
> That way user/admin won't even notice that something has changed, since
> everything looks same. To get such compatibility `chkconfig` should be
> aware of systemd, sysunixd, sysinetd... Same `libsystemd` can be used
> to do that. To make it backward compatible libsystemd should be written
> with fallback support - i.e. if a start(mysql) command was called, and it
> could not contact `systemd` it should fallback to /etc/init.d/mysql start.
> 
> Note: the structure is plain, without any complex nested relations. Recursive
> dependencies are not easy to display and hard to follow. They give no benefits
> in speed of starting services, but makes it much harder to track whether some
> service is going to starts or not. When admin says that mysql must be stopped
> it must be stopped, even if phpmyadmin from inside httpd tries to use it. And
> when admin says `service XXX restart` only XXX service must be restarted,
> nothing else. The `service` utility may print a warning about other services
> using same socket, but must not enforce any dependency (i.e. restarting
> network or syslog should not restart entire system).
> 
> There's one more thing systemd is currently doing - cgroups. I'm not sure what
> they're for, since they do not solve any task and do not increase boot speed,
> but if they're really needed - they can be supported as a runner-plugin to
> `libsystemd`. Every systemd-daemon uses `libsystemd` to run services, so there
> can be a common function, that is used to start a service. By default it would
> be a simple exec, but having plugin support. Why not just make it embedded
> in a systemd code? Because that's an additional unnecessary dependency, that
> may break things. I.e. someone may want to build a kernel without cgroups.
> Or may want to use these daemons on a system that has no cgroups support
> (BSD/Solaris?). Or maybe someone wants to have cgroups used by a different
> application (ulatencyd).
> 
> There's another benefit you can get from plugin-based runner - you can get an
> easily extendable running system. I.e. if you want to put every service in
> some specific environment you just need to install a runner plugin.
> 
> Such structure is also easy to extend, i.e. if you want to expose these
> features over dbus for some configuration utilities (system-config-services)
> you just need to write a dbus-interface to `libsystemd`.
> 
> So the entire project can be split into the following systemd packages:
> * libsystemd
>     content: /lib/libsystemd.so /lib/systemd-plugins/
>     configs: no
>     description: library containing client and server parts for
> systemd communication protocol
> * systemd (requires systemd-libs)
>     content: /sbin/systemd
>     configs: /etc/inittab /etc/rc.d/
>     description: sysv-compatible init daemon, nothing else
> * sysunixd (requires systemd-libs)
>     content: /sbin/sysunixd
>     configs: /etc/sysunixd.d/
>     description: inetd-like daemon, starts daemons using UNIX sockets
> * sysinetd (requires systemd-libs)
>     content: /sbin/sysinetd
>     configs: /etc/sysinetd.d/
>     description: inetd-like daemon, starts daemons using INET sockets
> * sysmountd
>     content: /sbin/sysmountd
>     configs: /etc/fstab /etc/crypttab /etc/fstab.d/ maybe
>     description: checks root filesystem in foreground and others in background
>     It does not share code with systemd and can be a totally separate package.
> * systemd-utils (requires systemd-libs)
>     content: /sbin/chkconfig /sbin/service /sbin/runlevel
>     configs: no
>     descrpion: sysv-compatible utilities for systemd daemons
> * systemd-cgroups (requires systemd-libs)
>     content: /lib/systemd-plugins/libsystemd-runner-cgroups.so
>     configs: no
>     description: runner plugin for libsystemd
> * systemd-dbus (requires systemd-libs)
>     content: /etc/dbus-1/system.d/systemd.conf and the service itself
>     configs: no
>     description: dbus service to libsystemd
> 
> 
> "Why is this better?"
> =====================
> Because it's flexible, portable, simple, easy to support and it's unix-way.
> 
> Such structure would work under any circumstances on almost any configuration.
> Users of other Linux distributions can install `sysunixd`/`sysinetd` and it
> will work out of box, together with any existing init system.
> 
> If someone, having a tightly integrated bunch of native upstart scripts, cannot
> switch entire init system, he can still install `sysunixd` and `sysinetd` and
> get some speed improvements. Even if he cannot use `sysunixd` and `sysinetd`
> for starting services he can still benefit from faster disks mounting using
> `sysmountd`.
> 
> It's extremely portable. Users of other operating systems can install almost
> every component (except systemd-cgroups). Such structure can be ported even
> to the systems that don't support UNIX sockets (just disable `sysunixd`
> and patch `libsystemd` to use different communication method).
> 
> It does not have any extra dependencies. So if someone needs to build a compact
> system for netbook with 128MB RAM he can install just `systemd` and save a few
> MB of RAM without dbus (Xorg+IceWM don't need dbus) and other systemd services.
> There's also no need in dbus on ssh/dns/http-servers.
> 
> Even a single `systemd-utils` package, without all other systemd daemons would
> still work for users, allowing to put every service into a special running
> environment using runner-plugins.
> 
> And weak dependencies between services makes debugging easier. When something
> breaks it's easy to find who was that and fix the bug.
> 
> 
> "Is this all?"
> ==============
> No, there're other ways to improve the structure.
> 
> For example, this structure forces developers (or maintainers) of every service
> to write two startup scripts - one for systemd-based script and one sysv-init
> script for non-systemd-based systems. Why not use a single script for that?
> For example instead of writing second startup script we just add the line:
>   # TCPListen: 22
> to the LSB Header of /etc/init.d/sshd. That will make transition easy and
> remains fully backward compatible. Services that don't have this line will
> be treated as usual sysv services.
> 
> That also makes developers' work easier - they won't have to write two
> startup scripts and worry about compatibility.
> 
> But let's leave the details alone. This email is already long enough. :)
> 
> 
> The End
> =======
> I'm writing these ideas here because I'm not experienced enough to implement
> them myself. So I just hope that people in this list, that are much more
> experienced, will like them. Thanks for reading.
> 
> If you have any questions - please, ask them.
> 



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20110717/fcc6df36/attachment-0001.pgp>


More information about the systemd-devel mailing list