[RFC] initoverlayfs - a scalable initial filesystem
Eric Curtin
ecurtin at redhat.com
Mon Dec 11 12:52:06 UTC 2023
On Mon, 11 Dec 2023 at 12:48, Eric Curtin <ecurtin at redhat.com> wrote:
>
> On Mon, 11 Dec 2023 at 11:51, Lennart Poettering <lennart at poettering.net> wrote:
> >
> > On Mo, 11.12.23 11:28, Eric Curtin (ecurtin at redhat.com) wrote:
> >
> > > > > For the items listed above I think you can find different solutions
> > > > > which do not necessarily compromise security as much.
> > > > >
> > > > > So, in the list above you could address the latter three like this:
> > > > >
> > > > > 2. Use an erofs rather than a packed cpio as initrd. Make the boot
> > > > > loader load the erofs into contigous memory, then use memmap=X!Y on
> > > > > the kernel cmdline to synthesize a block device from that, which
> > > > > you then mount directly (without any initrd) via
> > > > > root=/dev/pmem0. This means yout boot loader will still load the
> > > > > whole image into memory, but only decompress the bits actually
> > > > > neeed. (It also has some other nice benefits I like, such as an
> > > > > immutable rootfs, which tmpfs-based initrds don't have.)
> > >
> > > What I am unsure about here, is the "make the bootloader load the
> > > erofs into contiguous memory" part. I wonder could we try and use the
> > > existing initramfs data as is.
> >
> > Today's initrds are packed cpio archives of an OS file system
> > hierarchy. What I proposed means you'd have to put the OS file system
> > hiearchy into an erofs image instead. Which is a trivial operation,
> > just unpack and repack.
> >
> > Note that there are two concepts of "initrd" out there.
> >
> > a) from the kernel perspective an initrd/initramfs (which both are
> > badly named, because its a tmpfs these days) is that packed cpio
> > archive that is unpacked into a tmpfs, and then jumped into.
> >
> > b) from systemd's perspective an initrd is an OS image that carries an
> > /etc/initrd-release file. If that file exists then systemd will not
> > boot up the system regularly, but instead just prepare everything
> > that it can transition into some other root fs.
> >
> > While most often in real life the initrds currently qualify under both
> > definitions. But there's no reason to always do this. You can also
> > have images the kernel would consider an initrd, but systemd does not,
> > which is something we use in the "USI" concept, i.e. "unified system
> > images", which are basically UKIs (large UKIs) with a complete rootfs
> > that is the main system of the OS. And you can also do it the other
> > way round, which is potentially what I am suggesting to you here: use
> > an erofs image that would not be considered an initrd by the kernel,
> > but that systemd would consider one, and transition out of.
> >
> > > I dunno if
> > > bootloaders make much assumptions about the format of that data, worst
> > > case scenario we could encapsulate erofs in the initramfs, cpio looking
> > > data.
> >
> > boot loaders generally don't bother with the cpio, it's just "data"
> > for them. Compression algorithms have changed in the past, and it only
> > mattered that the kernel could decompress it, the boot loader doesn't care.
> >
> > > Teach the kernel not to decompress and process the whole
> > > thing and mount it like an erofs alternatively. Does this sound crazy
> > > or reasonable?
> >
> > You are re-inventing the traditional "initrd" logic of the kernel
> > which was a ramdisk (i.e. a block device /dev/ram0), that was filled
> > with some fs of your choice loaded by the boot loader.
>
> Sort of yes, but preferably using that __initramfs_start /
> initrd_start buffer as is without copying any bytes anywhere else and
> without teaching the bootloaders to do things.
>
> The "memmap=" approach you suggested sounds like what we are thinking,
> but do you think we could do this without teaching bootloaders to do
> new things?
Like could we do that with a "initrd3.0=on" karg and it just uses the
__initramfs_start and __initramfs_size to memmap? (that probably
wouldn't be the arg name, it's just for description purposes here,
maybe it's even a build time flag, etc.)
>
> Although the nice thing about a storage-init like approach is there's
> basically zero copies up front. What storage-init is trying to be, is
> a tool to just call systemd storage things, without also inheriting
> all the systemd stack.
>
> >
> > Lennart
> >
> > --
> > Lennart Poettering, Berlin
> >
More information about the systemd-devel
mailing list