[systemd-devel] Security and technical differences between systemd-nspawn and OpenVZ / LXC
Vito Caputo
vcaputo at pengaru.com
Thu Jul 6 16:20:14 UTC 2023
On Thu, Jul 06, 2023 at 06:49:47PM +0300, Mantas Mikulėnas wrote:
> On Thu, Jul 6, 2023 at 6:05 PM Paulo Coghi - Coghi IT <paulocoghi at gmail.com>
> wrote:
>
> >
> > 4. Storage and Inodes
> > On OpenVZ, we could create "virtualized" file systems, like ploop, which
> > avoids consuming inodes on the host's file system, while lightweight enough
> > to provide near-native performance.
> > Is there any approach to have similar benefits through systemd-nspawn?
> >
>
> Nspawn supports running containers off a loop-mounted image, but nothing
> built-in with the same features, although ploop seems to be a fully
> separate kernel module (i.e. not strictly part of OpenVZ), so in theory you
> could still use it with nspawn. Alternatively, you could use regular loop
> devices (which can be space-efficient with all recent kernels, as they now
> support TRIM) if you don't need the snapshotting.
>
> Though, "consuming inodes" is only a problem with Ext4, isn't it? Does the
> same type of problem even exist on more modern filesystems like XFS or
> Btrfs?
>
When I was lsat deeply involved in containers (rkt dev), overlayfs was
the new shiny escape hatch, with its own set of problems. CoreOS was
using btrfs prior to that, but with no end in sight of btrfs bugs like
ENOSPC issues, that ended in favor of overlayfs. Maybe it's better
today, SUSE seems to be making btrfs work for them.
XFS has reflink, which still burns inodes, otherwise you'd just have
hard links. IIRC reflinks share the extents until modified, and at
least need inodes for something to point at the shared extents. A
bonus is the duplicated inode metadata may diverge, so ownership/mode
can change without affecting the other names sharing the extents.
Perf down-sides are you incur the cost of creating all those inodes
during instantiation of the container tree w/reflink. Also the buffer
caches on reflinks aren't shared because the inodes are distinct, even
if the files are unmodified. At least that was the case when I last
checked (years ago).
Overlayfs gives you shared buffer caches for unmodified files, but you
also get spurious oddities in the inode metadata pre vs. post copy-up
(on write). Not sure where that stands today. At the time we pivoted
to overlayfs @ CoreOS, even rpm lock-files were breaking because of
this weirdness. Overlayfs is kind of a dirty hack, but checks a few
desirable boxes, in a sea of inadequate in-kernel filesystems.
Not sure where OpenZFS stands here. It may be the bees-knees, perhaps
someone with experience can chime in?
Regards,
Vito Caputo
More information about the systemd-devel
mailing list