<html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body text="#000000" bgcolor="#FFFFFF"> <div class="moz-cite-prefix">On 02/10/2016 04:27 PM, Lennart Poettering wrote: </div> <blockquote cite="mid:20160210212749.GA18538@gardel-login" type="cite"> <pre wrap="">On Wed, 10.02.16 15:58, Daniel J Walsh (<a class="moz-txt-link-abbreviated" href="mailto:dwalsh@redhat.com">dwalsh@redhat.com</a>) wrote: </pre> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <blockquote type="cite"> <pre wrap=""> sed -i 's/^enable/disable/g' /lib/systemd/system-preset/* </pre> </blockquote> <pre wrap="">Why would this matter? </pre> </blockquote> <pre wrap="">We don't want excess services running inside of a docker container. I only want systemd/journald and any services that I enable in the container. Not something pulled in because the installer thinks this is a VM or a Host OS. </pre> </blockquote> <pre wrap="">Well, the default preset policy in Fedora is to disable everything by default, modulo a few exceptions. Hence it should be unnecessary to change anything with the default preset policy, unless you actually want to *enable* rather than disable more by default... </pre> </blockquote> <pre wrap=""> Here is what I see enabled in the base container. I don't think we want any of this stuff running by default in a docker container. </pre> </blockquote> <pre wrap=""> […] Well, but pretty much all the units you listed here are units from RPMs you wouldn't install in a container anyway, aren't they? This, they shouldn't matter anyway, and I'd argue they should be enabled by default in a container too – if they are installed explicitly by the user, through RPM. Hence, I think patching the preset stuff is not necessary at all. </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">I don't see why one would want to mask systemd-logind.service. If you permit logins and PAM at all, you really need that. </pre> </blockquote> <pre wrap=""> If I wanted to add a login program I could enable/unmask these. No one runs docker containers as login services, that would require getty. </pre> </blockquote> <pre wrap=""> Well, "machinectl shell", "cron" and all those things do PAM... In fact the fact that "machinectl shell" goes through PAM and registers with logind through that is one of the major benefits over naked "nsenter". </pre> </blockquote> I wonder if any of these work correctly inside of a docker container? Can these be customized or do they require systemd as pid 1 inside of the container. Docker has a "docker exec" command which does the correct thing, puts the command inside of the containers Namespaces, cgroup, SELinux label, Capabilties ... <blockquote cite="mid:20160210212749.GA18538@gardel-login" type="cite"> <pre wrap=""> I can see that you don't want to run it by default, but maybe we can rearrange things so that logind is started on first use (i.e. on the first PAM conversation). That way logind would normally not run in a container, until it is actually requested by PAM conversation. We could even add exit-on-idle so that it goes away after a while when the user logs out again. That way logind could stay available but would normally not appear in "ps" unless it is actually used. I added this to the TODO list now. </pre> </blockquote> Sounds fine with me. I went back to the original container and I can remove all of the other modifications, I can live with the warnings at the beginning and remove the /etc/fstab. We just need to get this into more people hands to see what happens and what breaks. As far as Hugepages is concerned, it seems there is some discussion on it here <a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/show_bug.cgi?id=1199164">https://bugzilla.redhat.com/show_bug.cgi?id=1199164</a> <blockquote cite="mid:20160210212749.GA18538@gardel-login" type="cite"> <pre wrap=""> </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">And masking the getty stuff appears to be entirely unnecessary... </pre> </blockquote> <pre wrap="">Again the goal is just to get rid of the getty failure message at bootup. </pre> </blockquote> <pre wrap=""> But there should really be none with current systemd, as you don't have /dev/tty0 and the getty unit has ConditionPathExists=/dev/tty0. How precisely does the getty message look like that you get? </pre> </blockquote> This is what I am seeing now with just /etc/fstab removed. <pre>Welcome to Fedora 23 (Twenty Three)! Set hostname to <654f7872d331>. dev-hugepages.mount: Cannot add dependency job, ignoring: Unit dev-hugepages.mount is masked. sys-fs-fuse-connections.mount: Cannot add dependency job, ignoring: Unit sys-fs-fuse-connections.mount is masked. systemd-remount-fs.service: Cannot add dependency job, ignoring: Unit systemd-remount-fs.service is masked. systemd-logind.service: Cannot add dependency job, ignoring: Unit systemd-logind.service is masked. getty.target: Cannot add dependency job, ignoring: Unit getty.target is masked. </pre> <blockquote cite="mid:20160210212749.GA18538@gardel-login" type="cite"> <pre wrap=""> </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">Which leaves the /dev/hugepages and /sys/fs/fuse/connections mounts. Note sure about those. Are you running the container with CAP_SYS_ADMIN? If so, then there's no reason to mask those units. If not, then I figure we could add checks that these are conditioned out if CAP_SYS_ADMIN is missing. </pre> </blockquote> <pre wrap=""> No docker containers do not enable SYS_ADMIN or NET_ADMIN by default. </pre> </blockquote> <pre wrap=""> I'll add a ConditionCapability=CAP_SYS_ADMIN line to the fuse mount. The hugepages mount already has one (since 218). With that addition there should really be no reason to mask out either of the units explicitly, systemd should already silently skip them in a docker setup where CAP_SYS_ADMIN is missing. </pre> <blockquote type="cite"> <blockquote type="cite"> <pre wrap="">On nspawn these two aren't seen since nspawn actually doesn't mount the real sysfs to /sys, but just a tmpfs with a select number of subdirectories from the real sysfs for security reasons. One of the subdirs that are suppressed is /sys/fs. Now, sys-fs-fuse-connections.mount is conditionalized on /sys/fs/fuse/connections existing, hence if it is not there, then it won't be mounted. And /dev/hugepages we simply allow to be mounted in the container. </pre> </blockquote> <pre wrap=""> Interesting idea. Maybe we should just mount over /sys/fs also. </pre> </blockquote> <pre wrap=""> Well, note that we over-mount /sys with a tmpfs, and then some parts of the real /sys into that. /sys/fs hence is just a subdir of our private tmpfs. The tmpfs is marked r/o after everything is set up. </pre> <blockquote type="cite"> <pre wrap="">Do you just mount hugepages then during container setup? </pre> </blockquote> <pre wrap=""> No. In nspawn, when we pass CAP_SYS_ADMIN to the container the container will just mount /dev/hugepages correctly on its own. And we do drop CAP_SYS_ADMIN then the ConditionCapability=CAP_SYS_ADMIN in the unit file mentioned above will result in the mount being skipped silently already. Lennart </pre> </blockquote> </body> </html>