[systemd-devel] systemd-container: Trying to use a bookworm chroot with a buster host fails / Failed to create /init.scope control group

Bernhard Übelacker bernhardu at mailbox.org
Sat Dec 3 22:38:55 UTC 2022


(Resent after subscription, as non-subscribers get rejected.)



Hello,
I opened the initial Debian bug report, but did took the time to
ask at systemd-devel and found this thread was already asked,
so I am trying to provide further information.



> > Do you have any MACs in effect?
> No SELinux or Apparmor active

As far as I see in my test VM with minimal Debian Buster there is no SELinux.
"aa-status" returns "apparmor module is loaded.", but I did not intentionally
configure anything to it.



> > Does the host use cgroupsv2 or cgroupsv2 or hybrid? The host system uses systemd v241, compiled with default-hierarchy=hybrid
>
> > Was the container configured to use either?
> The container uses systemd v251 with default-hierarchy=unified

At the host:
    # systemd --version
    systemd 241 (241)
    +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS \
    +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid

In the container:
    # systemd --version
    systemd 252 (252.2-1)
    +PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID \
    +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY \
    -P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP \
    +SYSVINIT default-hierarchy=unified



> > What is mounted to /sys/fs/cgroup and below?

At the host:
    # mount | grep /sys/fs/cgroup
    tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
    cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
    cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
    cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
    cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
    cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
    cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
    cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
    cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
    cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
    cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
    cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
    cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)



> > This is new payload on old host?

Yes, it is an test to use on an older Debian Buster with kernel 4.19.260-1
a quite recent Debian Bookworm/testing system.



> > if you force container into cgroupsv1 mode as the host (by adding
> > systemd.unified_cgroup_hierarchy=no to the nspawn cmdline, does that
> > work?

I am not sure if I am using it right, but as far as I see
"systemd.unified_cgroup_hierarchy=no" does not help.
I added "debug" too, see below in [1].




> > Also, please provide the relevant output from "strace -f -s 500 -y -o
> > /tmp/log.strace" (put on some pastebin)

Following pastebin contains the last quarter of the log.strace
file recorded by the command in [1]:

   https://paste.debian.net/1262752/




I thought if strace can observe the process in question, would gdb also
be able. And found starting nspawn with gdbserver, 'set follow-fork-mode child'
and gdb from inside the container via plain chroot seems working well.

So it looks like the failing "syscall_0x1b7" from strace is "faccessat2" [2].

And it seems "faccessat2" got added just in kernel 5.8 [3],
therefore it might fail with the kernel 4.19.
So I fear this needs a newer kernel, and/or this is more a glibc issue then?


Kind regards,
Bernhard






[1]    # strace -f -s 500 -y -o /tmp/log.strace systemd-nspawn --directory=/var/lib/machines/test-bookworm --boot systemd.unified_cgroup_hierarchy=no debug
     Spawning container test-bookworm on /var/lib/machines/test-bookworm.
     Press ^] three times within 1s to kill container.
     systemd 252.2-1 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
     Detected virtualization systemd-nspawn.
     Detected architecture x86-64.
     Detected initialized system, this is not the first boot.
     Kernel version 4.19.0-22-amd64, our baseline is 4.15

     Welcome to Debian GNU/Linux bookworm/sid!

     Hostname set to <debian>.
     sd-netlink: Failed to enable NETLINK_GET_STRICT_CHK option, ignoring: Protocol not available
     Failed to add address 127.0.0.1 to loopback interface: Operation not permitted
     Failed to add address ::1 to loopback interface: Operation not permitted
     Failed to bring loopback interface up: Operation not permitted
     Setting '/proc/sys/fs/file-max' to '9223372036854775807
     '
     No credentials passed via fw_cfg.
     Failed to open '/sys/firmware/dmi/entries/11-0/raw', ignoring: No such file or directory
     Found cgroup on /sys/fs/cgroup/systemd, legacy hierarchy
     Using cgroup controller name=systemd. File system hierarchy is at /sys/fs/cgroup/systemd.
     Failed to create /init.scope control group: Operation not permitted
     Failed to allocate manager object: Operation not permitted
     [!!!!!!] Failed to allocate manager object.
     Exiting PID 1...
     Container test-bookworm failed with error code 255.





[2]
     (gdb) stepi
     0x00007ffff79c93ec      29        int ret = INLINE_SYSCALL_CALL (faccessat2, fd, file, mode, flag);
     1: x/i $pc
     => 0x7ffff79c93ec <__faccessat+44>:     syscall
     (gdb) bt
     #0  0x00007ffff79c93ec in __faccessat (fd=fd at entry=-100, file=file at entry=0x7fffffffe3c0 "/sys/fs/cgroup/systemd", mode=mode at entry=0, flag=flag at entry=256) at ../sysdeps/unix/sysv/linux/faccessat.c:29
     #1  0x00007ffff7c11380 in controller_is_v1_accessible (root=root at entry=0x0, controller=controller at entry=0x7ffff7f061ee "_systemd") at ../src/basic/cgroup-util.c:590
     #2  0x00007ffff7c12432 in cg_get_path_and_check (controller=0x7ffff7f061ee "_systemd", path=0x7fffffffe4e0 "/init.scope", suffix=0x0, fs=0x7fffffffe480) at ../src/basic/cgroup-util.c:612
     #3  0x00007ffff7b50eb0 in cg_create (controller=controller at entry=0x7ffff7f061ee "_systemd", path=path at entry=0x7fffffffe4e0 "/init.scope") at ../src/shared/cgroup-setup.c:292
     #4  0x00007ffff7b511db in cg_create_and_attach (controller=controller at entry=0x7ffff7f061ee "_systemd", path=path at entry=0x7fffffffe4e0 "/init.scope", pid=pid at entry=0) at ../src/shared/cgroup-setup.c:324
     #5  0x00007ffff7e3faa4 in manager_setup_cgroup (m=0x55555556edb0) at ../src/core/cgroup.c:3468
     #6  0x00007ffff7ea463b in manager_new (scope=<optimized out>, test_run_flags=MANAGER_TEST_NORMAL, _m=_m at entry=0x7fffffffe600) at ../src/core/manager.c:939
     #7  0x000055555555bf5c in main (argc=3, argv=0x7fffffffecd8) at ../src/core/main.c:2928
     (gdb) print/x $eax
     $1 = 0x1b7
     (gdb) stepi
     29        int ret = INLINE_SYSCALL_CALL (faccessat2, fd, file, mode, flag);
     1: x/i $pc
     => 0x7ffff79c93ee <__faccessat+46>:     cmp    $0xfffffffffffff000,%rax
     (gdb) print/x $eax
     $2 = 0xffffffff

     (gdb) list faccessat.c:29
     24
     25
     26      int
     27      __faccessat (int fd, const char *file, int mode, int flag)
     28      {
     29        int ret = INLINE_SYSCALL_CALL (faccessat2, fd, file, mode, flag);
     30      #if __ASSUME_FACCESSAT2
     31        return ret;
     32      #else
     33        if (ret == 0 || errno != ENOSYS)

     (gdb) list cgroup-util.c:590
     585             /* If root if specified, we check that:
     586              * - possible subcgroup is created at root,
     587              * - we can modify the hierarchy. */
     588
     589             cpath = strjoina("/sys/fs/cgroup/", dn, root, root ? "/cgroup.procs" : NULL);
     590             return laccess(cpath, root ? W_OK : F_OK);
     591     }
     592
     593     int cg_get_path_and_check(const char *controller, const char *path, const char *suffix, char **fs) {
     594             int r;

     (gdb) list cgroup-util.c:612
     607                      * except for the named hierarchies */
     608                     if (startswith(controller, "name="))
     609                             return -EOPNOTSUPP;
     610             } else {
     611                     /* Check if the specified controller is actually accessible */
     612                     r = controller_is_v1_accessible(NULL, controller);
     613                     if (r < 0)
     614                             return r;
     615             }
     616



[3]
     https://bugs.archlinux.org/task/69563
     https://man.archlinux.org/man/faccessat2.2.en
       "faccessat2() was added to Linux in version 5.8."



More information about the systemd-devel mailing list