[systemd-devel] the need for a discoverable sub-volumes specification

Tue Nov 9 20:57:04 UTC 2021

On Di, 09.11.21 14:48, Ludwig Nussel (ludwig.nussel at suse.de) wrote:

> > and so on. Until boot succeeds in which case we'd rename it:
> >
> >    /@auto/root-x86-64:fedora_36.0
> >
> > i.e. we'd drop the counting suffix.
>
> Thanks for the explanation and pointer!
>
> Need to think aloud a bit :-)
>
> That method basically works for systems with read-only root. Ie where
> the next OS to boot is in a separate snapshot, eg MicroOS.
> A traditional system with rw / on btrfs would stay on the same subvolume
> though. Ie the "root-x86-64:fedora_36.0" volume in the example. In
> openSUSE package installation automatically leads to ro snapshot
> creation. In order to fit in I suppose those could then be named eg.
> "root-x86-64:fedora_36.N+0" with increasing N. Due to the +0 the
> subvolume would never be booted.
>
> Anyway, let's assume the ro case and both efi partition and btrfs volume
> use this scheme. That means each time some packages are updated we get a
> new subvolume. After reboot the initrd in the efi partition would try to
> boot that new subvolume. If it reaches systemd-bless-boot.service the
> new subvolume becomes the default for the future.
>
> So far so good. What if I discover later that something went wrong
> though? Some convenience tooling to mark the current version bad again
> would be needed.

In the sd-boot/kernel case any time you like you can rename an entry
to "…+0" to mark it as "bad", you could drop the suffix to mark it as
"good" or you could mark it as "+3" to mark it as
"dont-know/try-again".

Now, at least in theory we could declare the same for this new
directory auto-discovery scheme. But I am not entirely sure this will
work out trivially IRL because I have the suspicion one cannot rename
subvolumes which are the source of a bind mount (i.e. once you boot
into one root subtree, then it might be impossible to rename that
top-level inode without rebooting first). Would be something to try
out. If it doesn't work it might suffice to move things one level
down, i.e. that the dir that actually becomes root is
/@auto/root-x86-64:fedora_36.0/payload/ or so, instead of just
/@auto/root-x86-64:fedora_36.0/. I think that that would work, and
might be desirable anyway so that the enumeration of entries doesn't
already leak fs attributes/ownership/access modes/…  of actual root
fs.

> But then having Tumbleweed in mind it needs some capability to boot any
> old snapshot anyway. I guess the solution here would be to just always
> generate a bootloader entry, independent of whether a kernel was
> included in an update. Each entry would then have to specify kernel,
> initrd and the root subvolume to use.
> This approach would work with a separate usr volume also. In that case
> kernel, initrd, root and usr volume need to be linked by means of a
> bootloader entry.

For the GPT case if you want to bind a kernel together with a specific
root fs, you'd do this by specifying 'root=PARTLABEL=fooos_0.3' on the
kernel cmdline. I'd take inspiration from that and maybe introduce
'rootentry=fedora_36.2' or so which would then be honoured by the logic
we are discussing here, and would hard override which subdir to use,
regardless of versioning preference, assesment counting and so on.

(Yeah, the subvol= mount option for btrfs would work too, but as
mentioned I'd keep this reasonably independent of btrfs where its
easy, plain dirs otherwise are fine too after all. Which reminds me,
recent util-linux implements the X-mount.subdir= mount option, which
means one could also use 'rootflags=X-mount.subdir=@auto/fedora_36.2'
as non-btrfs-specific way to express the btrfs-specific
'rootflags=subvol=@auto/fedora_36.2')

Lennart

--
Lennart Poettering, Berlin