[systemd-devel] namespace problem

Mantas Mikulėnas grawity at gmail.com
Thu Jul 18 21:08:58 UTC 2024


On Thu, Jul 18, 2024, 15:43 Thomas Köller <thomas at koeller.dyndns.org> wrote:

> Am 18.07.24 um 14:04 schrieb Mantas Mikulėnas:
> > Yes, but namespace persistence actually relies on filesystem access –
> > it's implemented as a bind-mount of the namespace file descriptor (onto
> > /run/netns for the 'ip netns' tool), as otherwise namespaces only exist
> > as long as processes that hold them.
> >
> > So if you have any service options that cause a new *mount* namespace to
> > be created (preventing its filesystem mounts from being visible outside
> > the unit), then it cannot pin persistent network namespaces.
>
> Quoting the manual page:
>         ProtectSystem=
>             Takes a boolean argument or the special values "full" or
> "strict". If true, mounts the /usr/ and the boot loader directories
> (/boot and /efi) read-only for processes invoked by this unit. If set
>             to "full", the /etc/ directory is mounted read-only, too.
>
> No mention of /var or /run.


It still works this way whether it's mentioned or not. Once the unit's
process is put in a new mount namespace, the entire `/` is marked private
so that any mounts made underneath `/` remain visible only in that
namespace. This equally affects the "read-only /etc" mount done by systemd
itself as well as the /run/netns mount done by 'ip' or any other mounts
done anywhere else.

In theory it would be possible to carve out exceptions such as marking /run
shared again, but then /run/systemd would need to be marked private again,
etc. – and mount propagation across namespaces is complex enough as it is.

Also, note that the bind mounts in in
> /var/run/netns and /run/netns are actually created by 'ip netns add',
> they just are't usable.
>

No, the mount *points* in /run/netns are created (as regular empty files),
but they don't become actual mounts, that's why they're not usable.

There's a distinction between mount points (files or directories seen in
`ls`) and mounts (seen in `findmnt`) – make your service script log its
findmnt output to a file and compare it to findmnt output seen from the
outside.

(ember) /home/grawity $ mount | grep netns
tmpfs on /run/netns type tmpfs
(rw,nosuid,nodev,size=3268196k,nr_inodes=819200,mode=755,inode64)
(ember) /home/grawity $ sudo systemd-run --shell -p ProtectSystem=full
Running as unit: run-u1253.service; invocation ID:
9d4675b9ef7c40d68486b3058ee8a60b
Press ^] three times within 1s to disconnect TTY.
root at ember /home/grawity # mount | grep netns
tmpfs on /run/netns type tmpfs
(rw,nosuid,nodev,size=3268196k,nr_inodes=819200,mode=755,inode64)
root at ember /home/grawity # ip netns add foo
root at ember /home/grawity # mount | grep netns
tmpfs on /run/netns type tmpfs
(rw,nosuid,nodev,size=3268196k,nr_inodes=819200,mode=755,inode64)
nsfs on /run/netns/foo type nsfs (rw)
root at ember /home/grawity # exit
Finished with result: success
Main processes terminated with: code=exited, status=0/SUCCESS Service
runtime: 18.451s
(ember) /home/grawity $ mount | grep netns
tmpfs on /run/netns type tmpfs
(rw,nosuid,nodev,size=3268196k,nr_inodes=819200,mode=755,inode64)
(ember) /home/grawity $

(The non-systemd rough equivalent is `unshare --mount
--propagation=private`, and you can attach to a namespace using `nsenter` –
an "ip netns exec" is approximately an `nsenter --net`.)

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20240719/65cfca18/attachment.htm>


More information about the systemd-devel mailing list