[systemd-devel] keeping a backup ESP partition in sync

Mon May 27 08:12:43 UTC 2024

On Mo, 27.05.24 09:48, Alexander Gordeev (alex at gordius.net) wrote:

> > That said, the intended semantics for that are not clear to me at
> > all. i.e. there are some options:
> >
> > 1. mount the current ("primary") ESP to /efi/, and operate exclusively
> >    on that, except that at the very end after syncing the ESP is dd'ed
> >    on the block level onto a set of matching partitions other HDDs
> >    without any consideration of their current contents.
>
> Well, this means that the FAT filesystem IDs are going to be equal.
> This can be quite confusing, I think, since at the moment these IDs
> are the primary method to distinguish the filesystems when mounting
> them, right?

Yeah, it would be quite confusing. And yes, I think this is quite a
flawed approach in general.

> > 2. mount the current ("primary") ESP to /efi/, and expect that
> >    "secondary" ESPs are mounted to /efi.mirror/$DEVNAME/ or so, and
> >    then first operate on the primary ESP, and then only sync a very
> >    specific subset of dirs from the primary to the secondary ESPs,
> >    i.e. /loader/, /efi/Linux and /efi/systemd. Syncing would be
> >    "dumb", i.e. stupidly copy over, and remove dentrys not existing in
> >    the source.
> >
> >    This is far from trivial to implement, because how would we even
> >    decide what to mount to /efi.mirror/$DEVNAME/, how would we expect
> >    users to mark the set of partitions? probably would require some
> >    udev rule, but that creates messy problems around waiting for these
> >    mirrors on boot (because we do update the ESP automatically at
> >    boot, for updating the random seed automatically, and more). After
> >    all it should be OK if mirrors go missing, but that means we cannot
> >    really delay booting waiting for them anymore (because we cannot
> >    distinguish the case "device is just slow to pop up" from the case
> >    "device is dead").
> >
> > 2b. same as 2, but try to be "smart" with syncing, ie. look at file
> >     mtimes, and let the newer versions win. Probably doomed to fail,
> >     due to clock/timezone unreliability in early boot and in
> >     particular firmware writes.
> >
> > 3. some scheme where there's no primary nor secondary, but just an
> >    equal set of partitions. This is harder than it sounds, since it
> >    raises questions what to do if updating some partitions works but
> >    things fail on others: do we undo the first change again, or do we
> >    just continue? if we declared one of the copies as "primary" (as
> >    suggested above) this problem goes away somewhat, since it would
> >    mean we could have strict success rules on the "primary" copy, and
> >    lax rules on the "secondary" copies. This also would have the
> >    problem that 3rd party tools are generally not ready to deal with
> >    the fact that there's more than one equivalent esp.
> >
> > Hence, approach 2 is probably the best, but the waiting issue is a
> > major headache. it would probably mean we store away the list of
> > primary+secondary ESPs we have seen so far in a file in the ESP (which
> > is then sync'ed to all). And then add "bootctl wait-secondary-esps" or
> > so as a new tool that waits for them to show up, with some time-out
> > applied. But, uh, this gets so involved so quickly. (as you then
> > probably also need "bootctl add-secondary-esp" and "bootctl
> > remove-secondary-esp")
> >
> > But anyway, if this matters to you, feel free to send a patch for
> > this, but it's not really job for a day or two, it's much more
> > involved than one might think.
>
> Well, my initial idea was to add a file e.g.
> /etc/systemd/bootparts.conf listing the UUIDs or even mountpoints of
> the filesystems. The 'bootctl install' and 'bootctl update' could go
> through the list and repeat exactly the same steps, when called from
> package/initramfs/kernel hooks. Does the config file have to be kept
> on the ESP? Probably for some dual boot scenarios?

yes, this really belongs in a file copied into each ESP, so that the
ESPs are fully self descriptive and equivalently powerful, and every
user of the ESP in case of multi-boot scenarios can participate in
this correctly (well, as long as they support this ESP syncing logic
at all). I.e. /efi/loader/esp-sync or so as a file listing partition
table entry UUIDs of the other ESPs.

(And we probably need a similar logic for XBOOTLDR).

> And so you say, that the secondary ESPs will become not bootable after
> the next boot because of the writes done only in the primary ESP by
> firmware/sd-boot/sd-stub, right? If so, maybe this is indeed going to
> be very fragile...

The whole exercise is done to keep them bootable. So yes, the writes
done by firmware/boot loader are going to remain local to the ESP used
for booting, but that should be fine as long as after boot with ensure
the differences are evened out, i.e. that "bootctl random-seed" is
used from userspace to place a fresh random seed on every listed ESP,
that "bootctl update" updates the boot loader in every listed ESP and
that "kernel-install" copies kernels into every listed ESP and so on,
that "systemd-bless-boot" resets the boot counters for the booted
kernel on every ESP, and so on.

(the way I'd implement this, is not by actually teaching these
commands individual multi-ESP support, but simply by implementing a
single sync_esp() call or so which syncs the relavant info from
primary to secondary ESPs correctly, and that each of these commands
just call as last step. For single-ESP setups this call would be a NOP)

Yes, it's a bit of work.

Lennart

--
Lennart Poettering, Berlin