[systemd-devel] systemd-nspawn with filesystem id mapping
systemd-devel at notandy.de
systemd-devel at notandy.de
Sun May 30 21:21:43 UTC 2021
Hi!
I was very pleased to see the "nspawn: add support for kernel 5.12 ID mapping mounts #19438"-pull request and went right at it to try it out.
The following was tested on the current git head of systemd running on archlinux.
What I try to achieve on a high level is kind of emulating bubblewrap and executing chromium under wayland with gpu acceleration and working audio using PipeWire.
For that I need to pass some sockets and devices to the container using --bind-ro . I want to use --private-users=pick to have easier separation between multiple Containers.
That means I do not know the running uid of the process before nspawn spawns my container. That results on problems accessing the sockets.
Until now I used setfacl to work around this limitation and allow access to the sockets.
I was hoping to be able to skip that with --private-users-ownership=map .
I'm passing three sockets belonging to uid 1000 on the host to a container with private-users=pick and and try to access it via uid 1000 (name "user") in the container.
Everything is happening on an ext4 file system. I'd prefer btrfs but that is (so far) lacking id mapping support.
The full call looks like that:
statepath="/machines/state/chromium/${profilename}"
systemd-nspawn \
-D /machines/images/archlinux-chromium/ \
--private-users=pick \
--private-users-ownership=map \
--no-new-privileges=yes \
--as-pid2 \
--machine "chromium-${profilename}" \
--user user \
--bind-ro /var/run/user/1000/pulse/native:/sockets/pulse/native \
--bind-ro /var/run/user/1000/wayland-1:/sockets/wayland-1 \
--bind-ro /var/run/user/1000/pipewire-0:/sockets/pipewire-0 \
--bind "${statepath}:/home/user" \
--bind /dev/dri/renderD128 \
-E WAYLAND_DISPLAY=wayland-1 \
-E XDG_RUNTIME_DIR=/sockets \
chromium --enable-features=UseOzonePlatform --ozone-platform=wayland
This results in the following output:
Spawning container chromium-default on /machines/images/archlinux-chromium.
Press ^] three times within 1s to kill container.
Selected user namespace base 552206336 and range 65536.
Failed to create mount point /machines/images/archlinux-chromium/sockets/pipewire-0: Value too large for defined data type
I've run strace on it, this results in the following relevant output:
[pid 524] mount("/machines/state/chromium/default", "/proc/self/fd/8", NULL, MS_BIND|MS_REC, NULL) = 0
[pid 524] close(8) = 0
[pid 524] newfstatat(AT_FDCWD, "/var/run/user/1000/pipewire-0", {st_mode=S_IFSOCK|0666, st_size=0, ...}, 0) = 0
[pid 524] openat(AT_FDCWD, "/machines/images/archlinux-chromium", O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 8
[pid 524] openat(8, "sockets", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 10
[pid 524] newfstatat(10, "", {st_mode=S_IFDIR|0700, st_size=4096, ...}, AT_EMPTY_PATH) = 0
[pid 524] close(8) = 0
[pid 524] openat(10, "pipewire-0", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = -1 ENOENT (No such file or directory
)
[pid 524] close(10) = 0
[pid 524] newfstatat(AT_FDCWD, "/machines/images/archlinux-chromium/sockets", {st_mode=S_IFDIR|0700, st_size=40
96, ...}, 0) = 0
[pid 524] openat(AT_FDCWD, "/machines/images/archlinux-chromium/sockets/pipewire-0", O_RDONLY|O_NOFOLLOW|O_CLOE
XEC|O_PATH) = -1 ENOENT (No such file or directory)
[pid 524] openat(AT_FDCWD, "/machines/images/archlinux-chromium/sockets/pipewire-0", O_WRONLY|O_CREAT|O_EXCL|O_
CLOEXEC, 0644) = -1 EOVERFLOW (Value too large for defined data type)
[pid 524] writev(2, [{iov_base="Failed to create mount point /ma"..., iov_len=122}, {iov_base="\n", iov_len=1}]
, 2Failed to create mount point /machines/images/archlinux-chromium/sockets/pipewire-0: Value too large for defin
ed data type
) = 123
This maps to the touch in nspawn-mount.c at line 754.
If I skip the --bind(-ro) part this works fine (except chromium of course not working), same if I keep the binds and remove the --private-users-ownership=map.
I'm kind of lost on how to go on about this issue at this point.
Have I made a mistake or wrong assumption about how that should work?
Should I open an issue on github about that?
Thanks,
nd
More information about the systemd-devel
mailing list