[systemd-devel] [PATCH 1/4] Adding halt binary to shutdown the system

Fri Oct 1 06:52:25 PDT 2010

On Fri, 01.10.10 02:28, Gustavo Sverzut Barbieri (barbieri at profusion.mobi) wrote:

> From: Fabiano Fidencio <fidencio at profusion.mobi>
> 
> This functions are working as follows:
>     - Send a SIGTERM to all process
>     - Send a SIGKILL to all process
>     - Try to umount all mount points
>     - Try to remount read-only all mount points that can't
>     be umounted
>     - Call shutdown
> 
>     If one step fail, shutdown will be aborted

This shouldn't fail, ever. We need to make sure that we don't ever hang in
the middle of nowhere.

> +	systemd-halt \

Could we name this "systemd-shutdown" please? I kinda see "shutdown" as
a higher level term for all of "halt", "reboot", "poweroff" and "kexec".

>  	systemd-modules-load \
>  	systemd-remount-api-vfs \
>  	systemd-kmsg-syslogd \
> @@ -349,7 +350,7 @@ libsystemd_core_la_SOURCES = \
>  	src/service.c \
>  	src/automount.c \
>  	src/mount.c \
> -	src/umount.c \
> +	src/halt.c \

Hmm, this doesn't belong in libsystemd_core, and neither does umount.c.

> +static void sig_alarm(int sig) {
> +        run = 0;
> +}

Grmbl. Using signals like this is racy, since they might get delivered
between the check for 'run' in the loop below, and the next call to
wait(). The effect of that is the timout might get eaten up and the
system freezes.

Or in other words: never ever use SIGLARM or alarm().

> +int main(int argc, char *argv[]) {
> +        int c, cmd;
> +
> +        if (getpid() != 1)
> +                return -1;

Returning -1 in main() is not really defined.

> +
> +        if (argc != 2)
> +                return -1;
> +
> +        switch ((c = getopt(argc, argv, "hrp"))) {
> +                case 'h':
> +                        cmd = LINUX_REBOOT_CMD_HALT;

There's a break missing here.

> +                case 'p':
> +                        cmd = RB_POWER_OFF;
> +                        break;
> +                case 'r':
> +                        cmd = RB_AUTOBOOT;
> +                        break;
> +                default:
> +                        return -1;
> +        }

I'd kinda prefer a proper verb on the command line here. I.e. invoking
"systemd-shutdown halt" should halt the system, "systemd-shutdown
reboot" should reboot it, and so on.

> +
> +        /* lock us into memory */
> +        mlockall(MCL_CURRENT|MCL_FUTURE);
> +
> +        signal(SIGALRM, sig_alarm);
> +        kill(-1, SIGTERM);

Unfortunately this will probably not be sufficient. Some PIDs probably
must be spared (for example: mdmon and stuff). While I am not entirely
sure how we best should allow apps to mark themselves to be spared here
(my ideal solution would be to patch the kernel to allow user xattrs on
/proc/$PID...), I think we must iterate through the processes and kill
things individually, potentially omitting a few while doing that.

If you do this then it is probably wise to make sure that the SIGTERM is
delivered for all processes at the same time and not one after the
other, hence it is recommended to use kill(-1, SIGSTOP) before and
kill(-1, SIGCONT) afterwards. This is btw what the sysv implementation
of killall5 does.

> +        alarm(10);

Do not use alram.

> +        while (run) {
> +                pid_t r = wait(NULL);
> +                if (r == (pid_t)-1 && errno == ECHILD)
> +                        run = 0;
> +        }

While I like the idea of using wait() like this it won't work anymore if
we need to spare some processes. 

You can use sigtimedwait() + waitpid(WNOHANG) to wait for SIGCHLD and
apply a timeout.

Make sure that if you iterate through the processes you count the
processes you killed so that we know how many processes to wait for.

> +        alarm(0);
> +        kill(-1, SIGKILL);

A similar loop is needed here.

> +
> +        if (umount_init() < 0)
> +                return -1;
> +
> +        sync();

This is redundant, since the kernel does that anyway when you call
reboot(). We probably should remove all invocations of sync()
everywhere.

> +        return reboot(cmd);
> +}

Here are a couple of things I'd still like to see added, but which are
not necessary to get the basic version merged:

- use the devicemapper libraries to remove all dm disks after each
unmounting run that can be removed (in particular useful for crypto
disks). Also remove all loopback devices (i.e. /dev/loop).

- we need code that disables all swaps, similar to the code that umounts
everything. This should probably be integrated into the umount loop, to
properly deal with swap files.

- As a minor optimization we might want an umount run for all
non-API tmp disks before disabling all swaps. Why? Because when removing
the swaps everything that is swapped out on it will be swapped back into
memory. If this is tmpfs data that is not necessary, hence let's unmount
the tmpfs stuff first and then remove the swaps. (the fedora reboot
script does this). For the API tmpfs file systems (i.e. /sys/fs/cgroup
and /dev this should not be necessary since it does not store any big
files).

- having support for kexec would be cool. As last step simply invoke
/sbin/kexec -e -x -f, and if that fails fall back to a normal reboot.

The umount/unswap loop should probably look like this:

do {
        p = umount_all();
        q = unswap_all();
        r = undm_all();
        s = unloop_all();
} while (p > 0 || q > 0 || r > 0 || s > 0);

where the various xxx_all() functions return the number of
unmounts/unspaws/undms/unloops they executed.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.