[systemd-devel] Questions about systemd's "root storage daemon" concept

Martin Wilck mwilck at suse.com
Mon Jan 25 18:04:53 UTC 2021


On Mon, 2021-01-25 at 18:33 +0100, Lennart Poettering wrote:
> On Sa, 23.01.21 02:44, Martin Wilck (mwilck at suse.com) wrote:
> 
> > Hi
> > 
> > I'm experimenting with systemd's root storage daemon concept
> > (https://systemd.io/ROOT_STORAGE_DAEMONS/).
> > 
> > I'm starting my daemon from a service unit in the initrd, and
> > I set argv[0][0] to '@', as suggested in the text.
> > 
> > So far so good, the daemon isn't killed. 
> > 
> > But a lot more is necessary to make this actually *work*. Here's a
> > list
> > of issues I found, and what ideas I've had so far how to deal with
> > them. I'd appreciate some guidance.
> > 
> > 1) Even if a daemon is exempted from being killed by killall(), the
> > unit it belongs to will be stopped when initrd-switch-root.target
> > is
> > isolated, and that will normally cause the daemon to be stopped,
> > too.
> > AFAICS, the only way to ensure the daemon is not killed is by
> > setting
> > "KillMode=none" in the unit file. Right? Any other mode would send
> > SIGKILL sooner or later even if my daemon was smart enough to
> > ignore
> > SIGTERM when running in the intird.
> 
> Consider using IgnoreOnIsolate=.

Ah, thanks a lot. IIUC that would actually make systemd realize that
the unit continues to run after switching root, which is good.

Like I remarked for KillMode=none, IgnoreOnIsolate=true would be
suitable only for the "root storage daemon" instance, not for a
possible other instance serving data volumes only.
I suppose there's no way to make this directive conditional on being
run from the initrd, so I'd need two different unit files,
or use a drop-in in the initrd.

Is there any way for the daemon to get notified if root is switched?

> 
> > 3) The daemon that has been started in the initrd's root file
> > system
> > is unable to access e.g. the /dev file system after switching
> > root. I haven't yet systematically analyzed which file systems are
> > available.   I suppose this must be handled by creating bind
> > mounts,
> > but I need guidance how to do this. Or would it be
> > possible/advisable for the daemon to also re-execute itself under
> > the real root, like systemd itself? I thought the root storage
> > daemon idea was developed to prevent exactly that.
> 
> Not sure why it wouldn't be able to access /dev after switching. We
> do
> not allocate any new instance of that, it's always the same devtmpfs
> instance.

I haven't digged deeper yet, I just saw "No such file or directory"
error messages trying to access device nodes that I knew existed, so I
concluded there were issues with /dev.

> Do not reexec onto the host fs, that's really not how this should be
> done.

Would there be a potential security issue because the daemon keeps a
reference to the intird root FS?

> 
> > 4) Most daemons that might qualify as "root storage daemon" also
> > have
> > a "normal" mode, when the storage they serve is _not_ used as root
> > FS,
> > just for data storage. In that case, it's probably preferrable to
> > run
> > them from inside the root FS rather than as root storage daemon.
> > That
> > has various advantages, e.g. the possibility to update the sofware
> > without rebooting. It's not clear to me yet how to handle the two
> > options (root and non-root) cleanly with unit files.
> 
> option one: have two unit files? i.e. two instances of the subsystem,
> one managing the root storage, and one the rest.

Hm, that looks clumsy to me. It could be done e.g. for multipath by
using separate configuration files and setting up appropriate
blacklists, but it would cause a lot of work to be done twice. e.g.
uevents would be received by both daemons and acted upon
simultaneously. Generally ruling out race conditions wouldn't be easy.

Imagine two parallel instances of systemd-udevd (IMO there are reasons
to handle it like a "root storage daemon" in some distant future).

> option two: if you cannot have multiple instances of your subsystem,
> then the only option is to make the initrd version manage
> everything. But of course, that sucks, but there's little one can do
> about that.

Why would it be so bad? I would actually prefer a single instance for
most subsystems. But maybe I'm missing something.

Thanks,
Martin



More information about the systemd-devel mailing list