[systemd-devel] binding tmpfiles.d to unit startup

Sun Mar 2 14:51:06 PST 2014

On Sat, 01.03.14 14:46, Colin Walters (walters at verbum.org) wrote:

Hi!

> So for OSTree I am trying to move to a model where services populate
> the contents of /var on *start*.  See previous discussion here:
> 
> https://www.mail-archive.com/systemd-devel@lists.freedesktop.org/msg07859.html
> 
> The really great part about this is that one is then able to totally
> reset OS state at any time by simply just doing a shutdown of
> services, then "rm -rf /var/*", then  reboot.  (You can also reset
> /etc, that's a separate discussion)

There has been this long-time TODO list item of ours to create something
we called "provisioning" (though the name might be misleading, but
that's the keyword we are currently using for it). The provisioning
scheme would basically consist of another directory in /usr/lib that
contains snippets that are inspired by tmpfiles.d/, but are
different. These files would describe what to do if /etc or /var are
found empty at boot. The would not only list files and directories to
create or copy, but also contain information so that we can reconstruct
/etc/passwd and /etc/group to match UIDs and GIDs used in /usr already.

Maybe something like this:

    u root 0
    g mail /usr/bin/procmail
    g tty /usr/bin/write
    d /var/lib/foobar 664 root root
    c /etc/sudoers /usr/share/sudo/sudoers.default

This snippet would create one user with UID and call it "root". It would
create a group "mail" with a GID that matches the current gid owner of
/usr/bin/procmail, and one "tty" with the GID that matches the current
gid owner of /usr/bin/write. Then, it would also create a dir in
/var/lib. Finally it would copy a sudoers file into place from some
source in /usr (though maybe we should not allow copying files with
this, so that people don't get too lazy, and instead just provide
symlinks).

Then, we'd add a generator that checks for the existance of
/etc/machine-id or so (which we simply use as a flag file for
uninitialized systems here). If it is missing we boot into a special
boot target "provision.target" or so, which runs our provisioning tool
that simply reconstructs everything according to these files, and then
continues booting into default.target. There could even be a kernel
cmdline option that results in this being executed called
"systemd.provision=1" or so. Since "privision.target" is a target like
any other packages could even pull in their own code snippets from this
target if they really want to, but of course this would be quite
contrary to the entire goal which is to have a declarative description
of the system rather than one in code.

These snippets could then also be hooked up with RPM, so that RPM adds
the files listed therein automatically to its file list, and they can be
executed at package installation time with some RPM macro.

The provisioning tool that applies these files could of course also be
run manually on the command line, and much like tmpfiles support a
scheme to only reconstruct a subset of the file system hierarchy (for
example only /var).

For a container usecase this would allow us to have an OS /usr tree
somewhere which we then can mount into hundreds of containers, and on
their first start-up they would get a fully populated /var and /etc.

Colin, when you ask for doing this setup for /var right before starting
up a service, wouldn't it be nicer if we just did that on package
installation if we can, and on boot if /var is empty? Wouldn't this
"provisioning" concept work for you?

> Thoughts?  Should be a pretty easy patch.

I am not convinced that'd would be a good idea to place information
about this into the unit files. The unit files have been designed to be
something we only read on demand, when referenced. However, such
reconstruction logic is this kind of static thing where such an
on-demand concept would be really inappropriate.  This makes it "feel" a
bit incomptible, I'd say.

Also, it just sounds wrong to make changes to persistent file systems
during runtime, when they could be done at install time already... 

Another problem is the one of NSS. We cannot resolve user names from
PID1, since we cannot block on the network, we cannot be client to other
services and we cannot have dynamic NSS modules loaded into PID 1. Thus
creating files from PID 1 is difficult. (Though not impossible. That's
where the restriction like RuntimeDirectory= would be nice btw, since we
know that at least the backing fs isn't blocking, and we could add this
into the preparation step for executing processes, i.e. right after the
fork() in the child, before we exec() the daemon to start. Or we could
even introduce a new service state that is done before ExecStartPre= and
that to doesn't execute any external binaries, but simply does the NSS
stuff and fs access in a forked off process that does what it needs to
do and quickly exits, and never does exec(). -- but anyway, the
take-away here is probably that it is harder thatn it might sound...)

Lennart

-- 
Lennart Poettering, Red Hat