[systemd-devel] failing unmounts during reboot

Lennart Poettering lennart at poettering.net
Thu Jul 25 11:34:04 UTC 2019


On Do, 25.07.19 11:40, Frank Steiner (fsteiner-mail1 at bio.ifi.lmu.de) wrote:

> Hi,
>
> I'm currently discussing a problem with the SuSE support about failing
> unmounts during reboot. Tyring to debug this I realized that systemd
> is not killing processes left over by some init.d script. E.g. use
> the following script in /etc/init.d/
>
> #!/bin/sh
> #
> ### BEGIN INIT INFO
> # Provides: bla
> # Required-Start: $network $remote_fs sshd
> # Required-Stop: $network $remote_fs sshd
> # Default-Start:  2 3 5
> # Description:    test
> ### END INIT INFO
> case "$1" in
>      start) cd /test; /usr/bin/sleep 99d & ;;
>       stop) true;;
> esac

When generating native unit files from legacy sysv scripts, we use
KillMode=process, which means we'll only kill the main process, and
nothing else. This choice was made since its behaviour comes closest
to classic SysV behaviour (since there the init system didn't kill any
auxiliary processes either).

Given it's 2019 it might be wise to just write a native unit file if
you want better control of this. Note that for native unit files we
use different defaults: there we kill everything by default.

You can also reuse the generated unit, but only change the KillMode=
setting, by creating a drop-in in
/etc/systemd/system/<myservice>.service.d/<something>.conf, and then
adding there:

    [Service]
    KillMode=control-group

But let me underline: a SysV script which leaves processes around is
simply buggy. In sysv it was expected that scripts would clean up
properly on "stop", on their own. If you don't do that in your script,
then maybe fix that...

> On shutdown, unmounting /test will fail because the sleep process is
> not killed. Shouldn't there be a mechanism in system to kill processes
> spawned by LSB script when shutting these down?

Well, quite frankly, either way we do it people will be upset. If we
kill all processes of a service on stop, people tell us "sysv didn't
kill all processes on service stop, why do you"? Now you say the
opposite: "why don't you kill all service processes on stop, you
should!", but there's no way out.

If you ask me, just forget about SysV init scripts in 2019, and spent
the 15min to just put together a native unit. It will save you
frustration in the long run, and fixes all these issues.

Also note that we live in a world where various kinds of storage
(including much of NFS) requires local services running to
operate. Because of that we can't just decide "oh, its time to tear
down NFS, let's kill *every* process by PID 1", because then the whole
system will be borked.

The only correct way to shut things down is with individual ordering
between units really, but that falls apart if you stick to historic
sysv semantics too much.

>
> And moreover, wouldn't it make sense to have a mechanism to at least
> try to kill all processes using a filesystem before unmounting it?

There's no sensbible APi for that. Moreover this should be entirely
unnecessary with correctly behaving services. It's just that you wrote
a broken one...

> We often see failing unmounts of several local or iscsi fs during
> reboot, and in the support case we are currently working on with SuSE
> failing iscsi fs even cause xfs I/O errors. So it might be a good idea
> to have sth. like a lsof + kill before unmounting a filesystem, maybe
> configurable with a flag to enable or disable it. Even if lsof or kill
> failed, it wouldn't be worse than now.

lsof is a) slow (it searches all of /proc), b) racy (because it won't
properly grok fds coming and going), and c) incomplete (we live in a
world if pidns these days). This is a hack on top of a hack really,
let's not do that.

> As far as I see there is no way to write a drop-in for a mount unit
> that allows to execute commands before the unmount happens, is that
> right? Sth. like "ExecPreUmount=" would help here, especially if there
> was sth. like a umount at .service that would be called for every umount
> with e.g. the mounpoint accessable with a variable.

We didn't add that on purpose, since we wanted to make sure that what
systemd does is mostly reproducible with a plain "mount" command on
the shell...

You can manually do this though, but it's a hack really: just write a
service, order it After= the specific mount, and
Before=local-fs.target. But it's going to be super racy, and a poor
hack against missing ordering deps.

Long story short: fix your deps, write proper units.

Lennart

--
Lennart Poettering, Berlin


More information about the systemd-devel mailing list