[systemd-devel] Running a service *just* before unmounting filesystems

Hans de Goede hdegoede at redhat.com
Mon Jun 11 15:40:44 UTC 2018


Hi,

On 11-06-18 16:37, Lennart Poettering wrote:
> On Mo, 11.06.18 15:37, Hans de Goede (hdegoede at redhat.com) wrote:
> 
>>> Uurks. Quite frankly, it appears strange to me to delay this for this
>>> long. I mean we reworked most code that delayed worked to shutdown
>>> like this these days to happen as early as possible to make sure we
>>> don't lose state unnecessarily. For example the RTC syncing is
>>> generally done when the RTC is changed instead of synced back during
>>> shutdown. Hence, why not simply write this out when the boot is
>>> successful?
>>
>> There are 2 flags / grub environment variables in play here:
>>
>> boot_success
>> shutdown_success
>>
>> The idea being that we also want to show the grub menu if the
>> system did not shutdown cleanly (or somewhat cleanly given
>> that by the time we know we really have a clean shutdown we
>> can no longer write the grubenv).
> 
> Well, speaking from the receiving end of the bug report hose, I can
> tell you that shutdown hangs are almost exclusively happening way
> after we regularly unmount directories such as /efi or
> /boot. i.e. such hangs happen in the second phase of shutdown where we
> clean-up everything that couldn't be cleaned up during normal unit
> shutdown. (Or in fact even later, after the system returned back into
> the initrd.)
> 
> It am very sure it's not worth trying to maintain a shutdown_sucess
> variable that is determined that early. That's a pointless excercise,
> you won't catch 99% of relevant issues that way...

Ok, I had a quick chat with the rest of the laptop team about this
and will just drop the shutdown_success flag.

>> The feedback I've been getting on the fedora-devel list is that
>> people are somewhat worried about not being able to get to the
>> grub menu, so we are being very careful here and err-ing on the
>> side of showing the menu too often, rather then possibly not
>> often enough.
> 
> sd-boot solves that by always showing the menu if any key is pressed
> while sd-boot initializes. This means you can hold down any key you
> like during early boot and the menu is guaranteed to be shown. Why not
> do that in grub, too?

So in grub this is somewhat hard to do "any key" because of its architecture,
but I have written patches as part of the hidden menu effort, which will
show the menu when SHIFT is hold down during boot.

But we also (Fedora 30 timeframe) want to support fastboot, where
we don't check for a keypress at all. The problem is that scanning
the USB bus can take quite long and some firmware skips this if
their "fastboot" option is enabled (typically the default now a days),
but if we then ask for keypress / state info in grub most firmwares (*)
will do the USB scan at that point, causing easily 2-3 seconds extra
boot time.

So we want to get to a setup where we don't check for any key at all
(by default, this will all be configurable).

*) Some firmwares are stupid and simply return "no key", so the grub
menu already does not work there now a days

> I mean, if you are looking for a reliable way to get the menu back if
> things are bad, then such a shutdown hook is not going to help you,
> it's not useful to write out shutdown sucess info so early...

Ack.

>>> Note that /boot or /efi is very likely an automount point, (that's at
>>> least how we recommend things to be set up, as this provides the best
>>> guarantees that the ESP remains is a clean state, as it will be very
>>> quickly after the last access, and hence only be in dirty state during
>>> a very short timeframe around accesses), and in that case "right
>>> before unmount" doesn't make much sense in general, as that would be
>>> pretty much all the time (that said, I don't think fedora/Anaconda
>>> makes use automount points for /boot and /esp, or even systemd's
>>> auto-discovery of the ESP currently, they haven't seen the light there
>>> yet, but they really really should)
>>
>> Right Fedora still uses a regular fstab entry for the ESP and changing
>> that is out of scope for what I'm working on.
> 
> Well, but it might be worth supporting such setups anyway, no? i mean,
> I am pretty sure it would be wise to not focus on legacy environments
> when designing new stuff.

Ack.

>>> Hence, my recommendation would be: write a small service that is
>>> pulled in by default.target, but orders itself after it. Then make
>>> your changes from there. i.e. do it as final steps during boot, rather
>>> than delay it to shutdown.
>>
>> See above, currently this is for Fedora Workstation only, and the plan
>> is actually to do set boot_success from the systemd user session
>> (using pkexec with a user on console check to execute a new grub2-set-bootflag
>> binary which only supports changing a limited no env variables),
>> so that we know that the user has actually logged in successfully
>> before setting the flag.
> 
> Are you sure that powering up a system and powering it down
> right-after should trigger the boot menu?

I know that that is not ideal, but who would do that anyways? This
should happen very rarely and the side-effect is completely harmless.

OTOH with the fastboot stuff planned for Fedora 30 not being able to
login while the boot_success flag does get set is really bad, so I
think this is a good compromise.

>>> note that there have been plans of introducing some generic framework
>>> for such "boot completion" tests, as it is useful for a number of
>>> usecases, for example Atomic would like to use that. Such a framework
>>> would be very minimal most likely: add a new generically named target,
>>> before which all "is all good" checkers would be ordered, and after
>>> which all "mark the boot as successful" servers ared ordered. Your
>>> grub service would fit in perfectly in the latter then.
>>
>> So something like this would make sense for server / container
>> scenarios but not really for Fedora WS, if gdm starts but the
>> keyboard is not functional so the user cannot log in we still
>> want to show the menu the next boot so that the user can say
>> try an older kernel.
>>
>> And even in server / container scenarios ideally asserting
>> success would come from a service which checks that say a http
>> connection can be made or whatever depending on the role of
>> the server...
> 
> Precisely, that's why the target unit would be pluggable: downstreams
> can plug anything they like before it, so that the target would never
> be reached if by any of the deps of the new target the system would
> not be considered to be up. it's then up to downstream to define
> servies and plug them in.

That sounds like a good idea, but for Fedora Workstation we really
want to signal boot_success from the user-session. Lets say that
everything works fine in gdm and even login works, but then the
user-session gnome-shell crashes immediately. That still leaves
the system unusable (Fedora WS has sshd disabled by default).

So I agree with this plan for non-interactive installs, but for
installs where the primary use is interactive I believe it is
important to assert (as much as possible) that the user is
actually interacting with the system as a condition to
consider the boot successful.

 > Please, this deserves some discussion before you implement this. And
 > shutdown hooks are really not the way to go.

I agree this deserves some discussion, which is why we are
having this discussion now :)

###

So while I have you attention, for this whole auto-hide the menu /
determine previous boot was successful we also want to sometimes
increment an integer grub environment variable called
boot_indeterminate. Basically call:

grub2-editenv - incr boot_indeterminate

This is intended for reboots caused by selinux-relabels and
offline updates.

The idea is that boot_indeterminate==1 also counts as a boot
success (after which grub itself will increment it to 2). We don't
want the offline-updates to set boot_success=1 as we want to detect
an offline-updates reboot loop (as unlikely as that may be).

TL;DR: we want to call "grub2-editenv - incr boot_indeterminate"
when doing offline updates. I could just add a service for this
to: /lib/systemd/system/system-update.target.wants.

If I understand correctly systemd will start all services under:
/lib/systemd/system/system-update.target.wants one by one and
the service to mark the boot indeterminate should exit with an
error since it is not the one handling the updates (that is fine),
but if the actual update service runs before us then we won't
run.

I could modify all the services under /lib/systemd/system/system-update.target.wants
with an ExecStartPre to call grub2-editenv but I would prefer
a generic solution, any suggestions here ?

Regards,

Hans


More information about the systemd-devel mailing list