[systemd-devel] systemd-nspawn containers
Michał Zegan
webczat_200 at poczta.onet.pl
Fri Nov 11 18:21:19 UTC 2016
audit/autofs are not properly virtualized, I know. But I thought
keyrings and cgroups are.
W dniu 11.11.2016 o 18:28, Lennart Poettering pisze:
> On Fri, 11.11.16 16:41, Michał Zegan (webczat_200 at poczta.onet.pl) wrote:
>
>> Thank you for your answers!
>>
>> What I meant by secure containers is mostly, containers that are or will
>> be secure enough to use them for things like virtual private server
>> hosting. Is nspawn intended to be usable for such things in the future,
>> or maybe it already is, or whatever?
>
> I run my own server this way, already as an exercise of dogfooding.
>
> So, yes, running a VPS like this certainly works, but do note that
> nspawn doesn't do orchestration or anything. It's good enough for me,
> but if you needy fancy orchestration tools then nspawn won't be
> sufficient.
>
>> What kernel limitations do you mean when you say about security?
>
> Well, a lot of subsystems cannot be locked down properly for use in
> containers yet. You can lock down a lot, in particular if you use
> userns, but there are still a lot of holes in there, and in particular
> userns itself has been a major source of CVEs alone in the most recent
> kernels.
>
> Right now, "containers" in general are not about security. Some
> companies claim they were secure, but they really aren't. And that's
> not a bug in nspawn, or docker, or lxc for that matter, it's simply a
> limiation of the kernel.
>
> Or to say this differently: we'll do in nspawn everything we can to
> lock things down properly, but there are limits based on what the
> kernel provides... As the kernel gets improved in this area, we'll
> update nspawn to make use of it. We are sitting in the same boat in
> this regard as others container managers, and they have the same
> limits more or less we have.
>
>> For now I know that in full containers with userns file capabilities do
>> not work (I think), you have no virtualized /proc/meminfo and friends
>> (do cgroup namespaces give a chance to change that?), you cannot mknod
>> devices (no whitelist possible at this level), no fuse support, no
>> automatic uid shifting kernel level, no possibility to mount physical
>> filesystems in userns, and no possibility to have selinux/etc per
>> container. Do you mean such limitations or something else?
>
> Well, devices are not virtualized at all (with the exception of
> network devices), that means no udev, not hotplug events and so
> on. Some container managers ignore this, and provide access to
> selected device nodes anyway, but we don't do something like that in
> nspawn, since it's pretty broken (as /sys wouldn't match what you see
> in /dev). In general, I think people should just accept that
> containers mean "you don't get physical device access". And if you
> want physical device access, then don't use containers...
>
>> I am interested in this topic but it is quite hard for me to track
>> progress in that area (kernel side) even though I subscribe in some
>> kernel ml's and know at least about submitted patches, or some of
>> them. What else is missing that I didn't say about that would be
>> good to have?
>
> Well, a lot of stuff is still not properly virtualized. To mind come
> audit, autofs, keyring, cgroups, …
>
>> Also what about setting cgroup parameters per container? nspawn does not
>> allow doing that, and you probably do not intent it to be done by
>> overriding container's scope unit settings, for example?
>
> You can actually do that just fine. Simply set it in the nspawn service
> file. Or if you run nspawn from the cmdline with the "-p" switch. Or
> make your changes dynamically via "systemctl set-property". It's all
> supported and works well.
>
> Lennart
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 492 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20161111/fbaf2a7e/attachment.sig>
More information about the systemd-devel
mailing list