[systemd-devel] Running a service just before unmounting filesystems

Tue Jun 12 08:24:47 UTC 2018

On Mo, 11.06.18 17:40, Hans de Goede (hdegoede at redhat.com) wrote:

> > It am very sure it's not worth trying to maintain a shutdown_sucess
> > variable that is determined that early. That's a pointless excercise,
OB> > you won't catch 99% of relevant issues that way...
> 
> Ok, I had a quick chat with the rest of the laptop team about this
> and will just drop the shutdown_success flag.

Excellent! Thanks for reconsidering.

I mean, if there was a nice place we could store shutdown state info
at a very late point of shutdown we'd totally do that, but nothing
good has appeared so far. There are EFI variables and pstore, but
given the low quality of the memory of those things it's probably not
a good idea to write to them on every shutdown.

> But we also (Fedora 30 timeframe) want to support fastboot, where
> we don't check for a keypress at all. The problem is that scanning
> the USB bus can take quite long and some firmware skips this if
> their "fastboot" option is enabled (typically the default now a days),
> but if we then ask for keypress / state info in grub most firmwares (*)
> will do the USB scan at that point, causing easily 2-3 seconds extra
> boot time.

The other option of course is to emphasize the "reboot into firmware"
feature of EFI more. In systemd there's "systemctl reboot --firmware"
to get into the firmware setup that way. In sd-boot we also implicitly
add a menu item if the functionality is available. I figure gdm could
try to expose that feature somewhere, maybe in the top-right menu or
so?

I figure if there's need for it we could even have some mini daemon
whose only job is to provide a reboot-into-firmware hotkey during
early boot time. i.e. something that just listens to some otherwise
silent keycombo (maybe shift+alt+ctrl), and when it's pressed within
the first minute of bootup we'll instantly reboot into firmware or
so... In theory that could even be systemd-logind (which already
watches input devices for SW_DOCK and SW_LID events), but logind is
started quite late, hence maybe a seperate mini daemon might be
wise...

> > Are you sure that powering up a system and powering it down
> > right-after should trigger the boot menu?
> 
> I know that that is not ideal, but who would do that anyways? This
> should happen very rarely and the side-effect is completely
> harmless.

I have the suspicion that this can happen pretty regularly. Think:
university computer pools, internet cafes and suchlike, which boot up
in the morning, and shutdown in the night, and might not see anyone
actually log in. (That said, not sure if computer pools and internet
cafes still exist event — maybe in some less connected country, dunno)

Maybe an approach like this could work: define two image states:
"known-good" and "dont-know". A newly installed image comes up as
"dont-know", and as soon as the system level stuff is happy is marked
as "known-good". This part is obvious I guess. But now we'd allow the
system be moved back to "dont-know", and iterate through this
again. The first login on the system would use this, and set the state
back to "dont-know" and then "known-good" when the login worked
fine. And you could actually use systemd to manage both: the former
would be implemented by a target unit for the PID 1 service manager,
and the latter by a similar named target unit for the per-user service
manager.

With that approach you'd have a universal system again: server and
desktop systems would have the same behaviour and the same mechanisms
and you can easily convert one to the other and back.

BTW, to fill in a bit of background, which might be interesting in
this context, but is also a bit orthogonal: it was our intention to
add boot-counting and revert-to-last-working-image support to
sd-boot. The scheme is supposed to be very simple: whenever a new
kernel/initrd image is dropped into the ESP its filename would be
suffixed with a boot counter ".5". Whenever such an image is started
by sd-boot its name is first changed, decrementing the counter by
one. i.e. on the first boot of the image the counter would become
".4", and so on. Images with a counter of ".0" would not be booted
automatically anymore. And when the system managed to boot up cleanly,
the suffix would be dropped from the filename. In this scheme an image
"foo", could hence appear under the following names during its
lifetime:

- foo.5 (freshly installed, 5 tries to go)
- foo.4 (one failed boot, 4 tries to go)
- foo.3 (two failed boots, 3 tries to go)
- foo.2 (three failed boots, 2 tries to go)
- foo.1 (four failed boots, 1 tries to go)
- foo.0 (five failed boots, never try this one again)
- foo (known good, pick this one again)

Decrementing the counter would be done from boot loader
context. Dropping the counter would be done from the OS, after
boot-up.

This scheme is really simple as the counter is stored in very
discoverable ways in the ESP, and modifiable with simple shell tools
(both Linux and EFI shells that is). It is also implementable with one
simple operation that has a great chance of working correctly in EFI's
crappy file system implementations: file rename. Moreover it's
relatively lean on metadata: simply by picking the initial name the
installer can say how many tries shall be tried before giving up.

I wonder if it would be worth agreeing on common semantics
here. i.e. extend the Bootloader spec, to document these suffixes, so
that Grub could honour them too. And then extend them slightly to
cover your case too. For example, we could also say that whenever the
boot loader decrements the counter it also increments another
one. example: foo.5 → foo.4.1 → foo.3.2 → foo.2.3 → foo.1.4 →
foo.0.5. And to implement your usecase you'd then show the boot menu
automatically whenever the name has the second counter set.

> So while I have you attention, for this whole auto-hide the menu /
> determine previous boot was successful we also want to sometimes
> increment an integer grub environment variable called
> boot_indeterminate. Basically call:
> 
> grub2-editenv - incr boot_indeterminate
> 
> This is intended for reboots caused by selinux-relabels and
> offline updates.
> 
> The idea is that boot_indeterminate==1 also counts as a boot
> success (after which grub itself will increment it to 2). We don't
> want the offline-updates to set boot_success=1 as we want to detect
> an offline-updates reboot loop (as unlikely as that may be).
> 
> TL;DR: we want to call "grub2-editenv - incr boot_indeterminate"
> when doing offline updates. I could just add a service for this
> to: /lib/systemd/system/system-update.target.wants.

In the scheme I suggested above such an operation would simply be
"increase the counter again by one".

> If I understand correctly systemd will start all services under:
> /lib/systemd/system/system-update.target.wants one by one and
> the service to mark the boot indeterminate should exit with an
> error since it is not the one handling the updates (that is fine),
> but if the actual update service runs before us then we won't
> run.
> 
> I could modify all the services under /lib/systemd/system/system-update.target.wants
> with an ExecStartPre to call grub2-editenv but I would prefer
> a generic solution, any suggestions here ?

Not sure I follow. I mean, setting the state to "indeterminate" should
happen whenever the offline update operation succeeded, no? If the
offline update operation fails then this should be counted as a bad
boot, no? As such your little plugin should run after
system-update.target, exactly like in the default.target regular boot
case?

Lennart

-- 
Lennart Poettering, Red Hat

[systemd-devel] Running a service *just* before unmounting filesystems

[systemd-devel] Running a service just before unmounting filesystems