[systemd-devel] Questions about systemd's "root storage daemon" concept

Lennart Poettering lennart at poettering.net
Wed Jan 27 22:56:32 UTC 2021


On Mi, 27.01.21 21:51, Martin Wilck (mwilck at suse.com) wrote:

> Meanwhile I've looked a bit deeper into the problems accessing "/dev"
> that I talked about in my other post. scandir on "/" actually returns
> an empty directory after switching root, and any path lookups for
> absolute paths fail. I didn't expect that, because I thought systemd
> removed the contents of the old root, and stopped on (bind) mounts.
> Again, this is systemd-234.

Oh, right we actually use MS_MOVE to move the old /dev to the new
root. If you stay behind in the old you won't see anything anymore — it
got moved away.

Note that the switch root code also attempts to empty out the initrd
after the transition, or what's left of it. You might want to make the
initrd read-only if that is a problem to you.

> If I chdir("/run") before switching root and chroot("..") afterwards
> (*), I'm able to access everything just fine (**). However, if I do
> this, I end up in the real root file system, which is what I wanted to
> avoid in the first place.

Yes, this works the way it works, because /run is moved to the new
root, and thus if you chroot its parent you are in the new root.

> So, I guess I'll have to create bind mounts for /dev, /sys etc. in the
> old root, possibly after entering a private mount namespace?

if you want the initrd environment to fully continue to exist,
consider creating a new mount namespace, bind mount the initrd root
into it recursively to some new dir you created. Then afterwards mark
that mount MS_PRIVATE. then pivot_root()+chroot()+chdir() into your
new old world.

also, make the initrd superblock read-only, if you need its contents.

> The other option would be to save fd's for the file systems I need to
> access and use opendirat() only. Right?

That works too, if you can.
> (**) For notification about switching root, I used epoll(EPOLLPRI) on
> /proc/self/mountinfo, because I read that inotify doesn't work on proc.
> polling for EPOLLPRI works just fine.

Right, sorry. POLLPRI is the right API. inotify is used by cgroupfs
for similar notifications, and I mixed that up. for
/proc/self/mountinfo POLLPRI is the right choice.

Lennart

--
Lennart Poettering, Berlin


More information about the systemd-devel mailing list