[systemd-devel] logind vs CAP_SYS_ADMIN-lessness

Lennart Poettering lennart at poettering.net
Mon Jan 26 18:08:16 PST 2015

On Fri, 23.01.15 19:35, Christian Seiler (christian at iwakd.de) wrote:

>  - explicitly enable getty at tty{1,2,3,4}.service

Why? This cannot work. The getty services assume a Linux console tty,
they will issue ioctls and ansi sequences that only the linux console
supports, and do VT management on them.

/dev/tty1, /dev/tty2 and so on must refer to proper VT devices, that
come with the matching files in /sys/class/tty/, with matching
/dev/vcsa* and so on. Mounting a pseudo-tty to /dev/tty1, /dev/tt2 and
so on is a *really* bad idea.

If LXC suggests such a configuration, then please talk to the LXC
guys, this is *very* broken and should not be done. You'll confuse
systemd, logind and the gettys with that. You'll get an incorrect
TERM, and everything else will be fucked up, too.

Also, there's really no reason to do this. Just create as many ptys as
you wish, and then pass $container_ttys= to PID 1 with their names,
and systemd will do the right thing. See
for details.

>  - no ConditrionPathExists=/dev/tty0 for getty at .service

Yeah, well, this stuff is there for a reason. Don't remove that
stuff. This automatically disables the VT logic if no VT is
available. You shouldn't hack around that. Just use proper container
gettys instead ("container-getty at .service", which are automatcailly
instantiated via $container_ttys= among others).

>  - mask systemd-udevd.service (haven't tested if that's actually needed,
>    the lxc-debian template also does this however)

There's no point in doing that. udev uses
ConditionPathIsReadWrite=/sys anyway, and is automatically skipped
hence when /sys is read-only. COntainer manager really should set up
/sys read-only in containers, so that the various cotnainers don't
confuse each other by all trying to manage and change /sys, and more
importantly cannot fuck with security-sensitive settings. 

>  - touch /etc/fstab if you debootstrap it directly

You can just remove it. You don't need it in containers (and not even
on most hosts, unless you actually need to refer to external
partitions that cannot be auto-configured.

>  - I hope I didn't forget anything

I spent quite some time to ensuer that systemd systems work
out-of-the-box in container managers. Any container manager that
implements this stuff
should just work out-of-the-box, without *any* modification of the
system to boot.

> > I am tempted to just
> > change nspawn to mount a private tmpfs into /run/user, too, as it
> > already mounts /run anyway.
> That would solve /run-quota issues for CAP_SYS_ADMIN-less containers,
> but is unnecessary (although harmless) for those that do have it.

I decided against doing this after all. I think that systemd in a
container and on baremetal should work as similar as possible, and
thus not have orthogonal setups in /run. Hence, we either do something
(possibly skipping it it on missing perms) or, we don't do it at all,
but we don't do completely different things in different cases.

> (Note that in Debian you can also configure it to be on the same tmpfs
> as /run, but since on Debian it has mode 1777, there's a good reason NOT
> to do that.)

Yuck. Maybe Debian should lock that down. World-writable directories
are dangerous, nobody should use that.


Lennart Poettering, Red Hat

More information about the systemd-devel mailing list