[systemd-devel] Running a service *just* before unmounting filesystems

Hans de Goede hdegoede at redhat.com
Tue Jun 12 09:33:20 UTC 2018


Hi,

On 12-06-18 10:24, Lennart Poettering wrote:
> On Mo, 11.06.18 17:40, Hans de Goede (hdegoede at redhat.com) wrote:
> 
>>> It am very sure it's not worth trying to maintain a shutdown_sucess
>>> variable that is determined that early. That's a pointless excercise,
> OB> > you won't catch 99% of relevant issues that way...
>>
>> Ok, I had a quick chat with the rest of the laptop team about this
>> and will just drop the shutdown_success flag.
> 
> Excellent! Thanks for reconsidering.
> 
> I mean, if there was a nice place we could store shutdown state info
> at a very late point of shutdown we'd totally do that, but nothing
> good has appeared so far. There are EFI variables and pstore, but
> given the low quality of the memory of those things it's probably not
> a good idea to write to them on every shutdown.

Yes, I have considered using an EFI variable too, but I too I'm
afraid this will damage the crappy backing store for the EFI
variables.

>> But we also (Fedora 30 timeframe) want to support fastboot, where
>> we don't check for a keypress at all. The problem is that scanning
>> the USB bus can take quite long and some firmware skips this if
>> their "fastboot" option is enabled (typically the default now a days),
>> but if we then ask for keypress / state info in grub most firmwares (*)
>> will do the USB scan at that point, causing easily 2-3 seconds extra
>> boot time.
> 
> The other option of course is to emphasize the "reboot into firmware"
> feature of EFI more.

Yes we need "reboot into firmware" support for machines which
have fastboot enabled in the firmware, because otherwise there is no
way to get into the firmware. We actually already need this today.

But this does not really help with getting the grub menu when it
is necessary to rescue the system:

1) AFAICT this will not help with getting into grub when grub's fastboot
support is enabled and it won't even check for a key.

2) The system may be broken in such a way that the user is unable to
run the command / click the menu item for this.

> In systemd there's "systemctl reboot --firmware"
> to get into the firmware setup that way.

Ah I was working on a minimal hack to do this inside grub, but I will
drop that then.

> In sd-boot we also implicitly
> add a menu item if the functionality is available.

And I've cherry-picked a patch from Ubuntu to do the same in grub
(if the menu is shown) which is something which we should have done
a long time ago.

> I figure gdm could
> try to expose that feature somewhere, maybe in the top-right menu or
> so?

Yes I need to talk to the GNOME designers about adding some advanced
reboot options somewhere:

1) Reboot into firmware setup
2) Show boot menu next menu

Any others?

I'm thinking myself to do something like what Windows does (assuming
that will help with discoverability) where shift + click on reboot shows
this menu.

> I figure if there's need for it we could even have some mini daemon
> whose only job is to provide a reboot-into-firmware hotkey during
> early boot time. i.e. something that just listens to some otherwise
> silent keycombo (maybe shift+alt+ctrl), and when it's pressed within
> the first minute of bootup we'll instantly reboot into firmware or
> so... In theory that could even be systemd-logind (which already
> watches input devices for SW_DOCK and SW_LID events), but logind is
> started quite late, hence maybe a seperate mini daemon might be
> wise...

Hmm, how soon during boot is the ctrl+alt+up target available ?
We could add a .service file there which forces showing the grub
menu next time (and the grub menu will also allow entering firmware).

I already have a menu_show_once grubenv variable which gets checked
in grub.cfg-s generated (*) by the new grub2-mkconfig code I'm working
on for the auto-hide stuff, so the service file would just need
to call grub2-editenv to set that.

*) yes grub is ugly


>>> Are you sure that powering up a system and powering it down
>>> right-after should trigger the boot menu?
>>
>> I know that that is not ideal, but who would do that anyways? This
>> should happen very rarely and the side-effect is completely
>> harmless.
> 
> I have the suspicion that this can happen pretty regularly. Think:
> university computer pools, internet cafes and suchlike, which boot up
> in the morning, and shutdown in the night, and might not see anyone
> actually log in. (That said, not sure if computer pools and internet
> cafes still exist event — maybe in some less connected country, dunno)
> 
> Maybe an approach like this could work: define two image states:
> "known-good" and "dont-know". A newly installed image comes up as
> "dont-know", and as soon as the system level stuff is happy is marked
> as "known-good". This part is obvious I guess. But now we'd allow the
> system be moved back to "dont-know", and iterate through this
> again. The first login on the system would use this, and set the state
> back to "dont-know" and then "known-good" when the login worked
> fine. And you could actually use systemd to manage both: the former
> would be implemented by a target unit for the PID 1 service manager,
> and the latter by a similar named target unit for the per-user service
> manager.
> 
> With that approach you'd have a universal system again: server and
> desktop systems would have the same behaviour and the same mechanisms
> and you can easily convert one to the other and back.
> 
> BTW, to fill in a bit of background, which might be interesting in
> this context, but is also a bit orthogonal: it was our intention to
> add boot-counting and revert-to-last-working-image support to
> sd-boot. The scheme is supposed to be very simple: whenever a new
> kernel/initrd image is dropped into the ESP its filename would be
> suffixed with a boot counter ".5". Whenever such an image is started
> by sd-boot its name is first changed, decrementing the counter by
> one. i.e. on the first boot of the image the counter would become
> ".4", and so on. Images with a counter of ".0" would not be booted
> automatically anymore. And when the system managed to boot up cleanly,
> the suffix would be dropped from the filename. In this scheme an image
> "foo", could hence appear under the following names during its
> lifetime:
> 
> - foo.5 (freshly installed, 5 tries to go)
> - foo.4 (one failed boot, 4 tries to go)
> - foo.3 (two failed boots, 3 tries to go)
> - foo.2 (three failed boots, 2 tries to go)
> - foo.1 (four failed boots, 1 tries to go)
> - foo.0 (five failed boots, never try this one again)
> - foo (known good, pick this one again)
> 
> Decrementing the counter would be done from boot loader
> context. Dropping the counter would be done from the OS, after
> boot-up.
> 
> This scheme is really simple as the counter is stored in very
> discoverable ways in the ESP, and modifiable with simple shell tools
> (both Linux and EFI shells that is). It is also implementable with one
> simple operation that has a great chance of working correctly in EFI's
> crappy file system implementations: file rename. Moreover it's
> relatively lean on metadata: simply by picking the initial name the
> installer can say how many tries shall be tried before giving up.
> 
> I wonder if it would be worth agreeing on common semantics
> here. i.e. extend the Bootloader spec, to document these suffixes, so
> that Grub could honour them too. And then extend them slightly to
> cover your case too. For example, we could also say that whenever the
> boot loader decrements the counter it also increments another
> one. example: foo.5 → foo.4.1 → foo.3.2 → foo.2.3 → foo.1.4 →
> foo.0.5. And to implement your usecase you'd then show the boot menu
> automatically whenever the name has the second counter set.

Interesting. I've added Javier Martinez Canillas who is working
on implemeting BLS for Fedora 29 to the Cc.

We did consider doing some scheme where we would automatically
fallback to an older kernel (setting a boot_once variable for
that kernel (rather then boot 5 times) and then if that boot
was not marked as successful, switch back to the older kernel.

The problem is that classic Fedora Workstation is a grab bag
of bits and pieces and this scheme only takes the kernel into
account.

Where as an update to GNOME or mesa could just as well render
the system unusable and then we still want the user to get to
the grub menu so he can enter single-user mode and do a
downgrade there.  I know that for a lot of users if the system
is broken it is broken and a reinstall is the only answer, but
there is a group of users who will appreciate being able to
rescue there systems at this point.

The same problem applies to your "known good" suggestion
from above, we would need to clear "known good" as soon as
a single package changes, which makes it loose almost all its
value.

Another issue is side-effects of failing to mark boot_success,
showing the menu is an undesirable, but otherwise we-can-live-with-it
side-effect of failing to mark boot_success while we should have.

Automatically falling back to an older kernel is a worse
side-effect. I know you were not suggesting that, with the second
counter proposal. But this is something which we've considered
and rejected for classic Fedora Workstation.

Now OTOH for Atomic where the entire kernel + Base-OS is a single
fixed entity, auto-fallback is definitely something we want.

>> So while I have you attention, for this whole auto-hide the menu /
>> determine previous boot was successful we also want to sometimes
>> increment an integer grub environment variable called
>> boot_indeterminate. Basically call:
>>
>> grub2-editenv - incr boot_indeterminate
>>
>> This is intended for reboots caused by selinux-relabels and
>> offline updates.
>>
>> The idea is that boot_indeterminate==1 also counts as a boot
>> success (after which grub itself will increment it to 2). We don't
>> want the offline-updates to set boot_success=1 as we want to detect
>> an offline-updates reboot loop (as unlikely as that may be).
>>
>> TL;DR: we want to call "grub2-editenv - incr boot_indeterminate"
>> when doing offline updates. I could just add a service for this
>> to: /lib/systemd/system/system-update.target.wants.
> 
> In the scheme I suggested above such an operation would simply be
> "increase the counter again by one".

Ack.

>> If I understand correctly systemd will start all services under:
>> /lib/systemd/system/system-update.target.wants one by one and
>> the service to mark the boot indeterminate should exit with an
>> error since it is not the one handling the updates (that is fine),
>> but if the actual update service runs before us then we won't
>> run.
>>
>> I could modify all the services under /lib/systemd/system/system-update.target.wants
>> with an ExecStartPre to call grub2-editenv but I would prefer
>> a generic solution, any suggestions here ?
> 
> Not sure I follow. I mean, setting the state to "indeterminate" should
> happen whenever the offline update operation succeeded, no? If the
> offline update operation fails then this should be counted as a bad
> boot, no? As such your little plugin should run after
> system-update.target, exactly like in the default.target regular boot
> case?

AFAIK the service actually doing the updates is supposed to call
systemctl reboot --force when it is done, so any targets after
system-update.target won't get started ?

Regards,

Hans




More information about the systemd-devel mailing list