[systemd-devel] RootImage='s implementation

Wed Feb 7 10:28:07 UTC 2018

On Mi, 07.02.18 07:34, worz (worz at tuta.io) wrote:

> Hi list,After looking at the code that implements the RootImage=
> directive, I was wondering why it was necessary to do this in PID 1
> (code seems to have been partially borrowed from what nspawn uses),

It's not done in PID 1. It's done between the fork() and execve() in
the child process that is going to become the service's main
process.

> and why was RootImage= not internally converted to
> RootDirectory=/run/some/private/dir/ where this directory is where a
> small helper binary mounts the image, and this way it could be
> extended to support a lot of image types without adding code to PID
> 1. I was first thinking whether it would be possible to use this as
> a mount unit so that systemd implicitly adds a RequiresMountsFor=
> for that directory, or adds a dependency on such a transient mount
> unit. What are other people's thoughts on this, and possibly why
> this approach was not chosen instead (since I figure this is an
> implementation detail, things can always be reworked).  Thanks!

Well, the intention is to keep this relatively simple. Whether we
invoke the mount() system call with MS_BIND to bind mount an existing
directory as the service's root directory or if we invoke mount()
without that flag, and instead using a block device as source isn't the
biggest of differences...

Note that preparing the image is something that needs to take place in
the mount namespace created for the newly invoked service, that means
this all needs to take place after fork()ing, after creating the
namespace, but before then execve()ing the binary. Of course, we could
invoke another binary in that context but that makes things quite a
lot more complex for little gain: the code would need to run with full
privileges anyway, and inserting code to run external programs into
the logic for running external programs is kinda recursive
anyway. Moreover you'd have to pass non-trivial amounts of information
between the child process and that binary, which isn't trivial either.

Hence, for something this complex to do there really needs to be a
strong reason which I fail to see here at all.

Lennart

-- 
Lennart Poettering, Red Hat