[systemd-devel] I want to run systemd inside of a locked down base docker container

Daniel J Walsh dwalsh at redhat.com
Wed Feb 10 22:43:23 CET 2016



On 02/10/2016 04:27 PM, Lennart Poettering wrote:
> On Wed, 10.02.16 15:58, Daniel J Walsh (dwalsh at redhat.com) wrote:
>>>>>>     sed -i 's/^enable/disable/g' /lib/systemd/system-preset/* 
>>>>> Why would this matter?
>>>> We don't want excess services running inside of a docker container.  I
>>>> only want systemd/journald and any services
>>>> that I enable in the container.   Not something pulled in because the
>>>> installer thinks this is a VM or a Host OS.
>>> Well, the default preset policy in Fedora is to disable everything by
>>> default, modulo a few exceptions. Hence it should be unnecessary to
>>> change anything with the default preset policy, unless you actually
>>> want to *enable* rather than disable more by default...
>> Here is what I see enabled in the base container.  I don't think we
>> want any of this stuff running by default in a docker container.
> […]
>
> Well, but pretty much all the units you listed here are units from
> RPMs you wouldn't install in a container anyway, aren't they? This,
> they shouldn't matter anyway, and I'd argue they should be enabled by
> default in a container too – if they are installed explicitly by the
> user, through RPM. Hence, I think patching the preset stuff is not
> necessary at all.
>
>>> I don't see why one would want to mask systemd-logind.service. If you
>>> permit logins and PAM at all, you really need that. 
>> If I wanted to add a login program I could enable/unmask these.
>> No one runs docker containers as login services, that would require
>> getty. 
> Well, "machinectl shell", "cron" and all those things do PAM... In
> fact the fact that "machinectl shell" goes through PAM and registers
> with logind through that is one of the major benefits over naked
> "nsenter".
I wonder if any of these work correctly inside of a docker container?

Can these be customized or do they require systemd as pid 1 inside of
the container.  Docker has a "docker exec"
command which does the correct thing, puts the command inside of the
containers Namespaces, cgroup, SELinux label, Capabilties ...
> I can see that you don't want to run it by default, but maybe we can
> rearrange things so that logind is started on first use (i.e. on the
> first PAM conversation). That way logind would normally not run in a
> container, until it is actually requested by PAM conversation. We
> could even add exit-on-idle so that it goes away after a while when
> the user logs out again.
>
> That way logind could stay available but would normally not appear in
> "ps" unless it is actually used.
>
> I added this to the TODO list now.
Sounds fine with me.  I went back to the original container and I can
remove all of the other modifications, I can live with the warnings at the
beginning and remove the /etc/fstab.  We just need to get this into more
people hands to see what happens and what breaks. 

As far as Hugepages is concerned, it seems there is some discussion on
it here

https://bugzilla.redhat.com/show_bug.cgi?id=1199164
>>> And masking the getty stuff appears to be entirely unnecessary...
>> Again the goal is just to get rid of the getty failure message at
>> bootup.
> But there should really be none with current systemd, as you don't
> have /dev/tty0 and the getty unit has ConditionPathExists=/dev/tty0. 
>
> How precisely does the getty message look like that you get?

This is what I am seeing now with just /etc/fstab removed.

Welcome to Fedora 23 (Twenty Three)!

Set hostname to <654f7872d331>.
dev-hugepages.mount: Cannot add dependency job, ignoring: Unit dev-hugepages.mount is masked.
sys-fs-fuse-connections.mount: Cannot add dependency job, ignoring: Unit sys-fs-fuse-connections.mount is masked.
systemd-remount-fs.service: Cannot add dependency job, ignoring: Unit systemd-remount-fs.service is masked.
systemd-logind.service: Cannot add dependency job, ignoring: Unit systemd-logind.service is masked.
getty.target: Cannot add dependency job, ignoring: Unit getty.target is masked.


>>> Which leaves the /dev/hugepages and /sys/fs/fuse/connections
>>> mounts. Note sure about those. Are you running the container with
>>> CAP_SYS_ADMIN? If so, then there's no reason to mask those units. If
>>> not, then I figure we could add checks that these are conditioned out
>>> if CAP_SYS_ADMIN is missing.
>> No docker containers do not enable SYS_ADMIN or NET_ADMIN by
>> default.
> I'll add a ConditionCapability=CAP_SYS_ADMIN line to the fuse
> mount. The hugepages mount already has one (since 218).
>
> With that addition there should really be no reason to mask out either
> of the units explicitly, systemd should already silently skip them in
> a docker setup where CAP_SYS_ADMIN is missing.
>
>>> On nspawn these two aren't seen since nspawn actually doesn't mount
>>> the real sysfs to /sys, but just a tmpfs with a select number of
>>> subdirectories from the real sysfs for security reasons. One of the
>>> subdirs that are suppressed is /sys/fs. Now,
>>> sys-fs-fuse-connections.mount is conditionalized on
>>> /sys/fs/fuse/connections existing, hence if it is not there, then it
>>> won't be mounted. And /dev/hugepages we simply allow to be mounted in
>>> the container.
>> Interesting idea.  Maybe we should just mount over /sys/fs also.
> Well, note that we over-mount /sys with a tmpfs, and then some parts
> of the real /sys into that. /sys/fs hence is just a subdir of our
> private tmpfs. The tmpfs is marked r/o after everything is set up.
>
>> Do you just mount hugepages then during container setup?
> No. In nspawn, when we pass CAP_SYS_ADMIN to the container the
> container will just mount /dev/hugepages correctly on its own. And we
> do drop CAP_SYS_ADMIN then the ConditionCapability=CAP_SYS_ADMIN in
> the unit file mentioned above will result in the mount being skipped
> silently already.
>
> Lennart
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20160210/4d99d632/attachment-0001.html>


More information about the systemd-devel mailing list