[systemd-devel] [PATCH 1/4] Adding halt binary to shutdown the system

Fri Oct 1 10:43:32 PDT 2010

On Fri, 01.10.10 11:37, Gustavo Sverzut Barbieri (barbieri at profusion.mobi) wrote:

> > If you do this then it is probably wise to make sure that the SIGTERM is
> > delivered for all processes at the same time and not one after the
> > other, hence it is recommended to use kill(-1, SIGSTOP) before and
> > kill(-1, SIGCONT) afterwards. This is btw what the sysv implementation
> > of killall5 does.
> 
> we saw the STOP->sig->CONT, but I asked him to remove as we could do
> it better being pid1. His first implementation was very much like
> sysv.
> 
> can you confirm we need this for mdmon and all? If they just ignore
> SIGTERM, all we have to do is try to umount devicemapper before
> SIGKILL.

Well, the fact that mdmon needs that sucks, and I can only say that I
find this a major design flaw of MD. Kay and I have tried to track down
what this is necessary for and it seems it this is support for the
on-disk format of some hw raid adapters.

But be that is it may we probably do have to support this in some
way. How exactly the matching will work in the end we can leave open,
but we do need the loop unfortunately.

(The whole mdmon idea is just broken, because if it really wants to stay
around it must be ebale to reexec itself to release mmaped files on
disk. But AFAICS this isn't the case)

> > A similar loop is needed here.
> 
> why? things will just die... but yeah, the loop should exit quickly.

AFAIK SIGKILL isn't any different from other signals and hence is
asynchronous, and before you go on and unmount the file systems you must
make sure all processes are really gone so that they don't keep the file
system busy.

> >> +
> >> +        if (umount_init() < 0)
> >> +                return -1;
> >> +
> >> +        sync();
> >
> > This is redundant, since the kernel does that anyway when you call
> > reboot(). We probably should remove all invocations of sync()
> > everywhere.
> 
> I've added it during debug, I read the man page that reboot() did not
> ensure sync, and in the process I've lost some of my data... but at
> the end it seems that the reason was something else and this ended
> dangling... but it does not hurt either ;-)

Interesting. I am pretty sure the man page isn't correct here
though. But for now it's probably better to leave the sync in.

> > - use the devicemapper libraries to remove all dm disks after each
> > unmounting run that can be removed (in particular useful for crypto
> > disks). Also remove all loopback devices (i.e. /dev/loop).
> 
> shouldn't this be done BEFORE going to this step? Like in the regular
> systemd units? When we run this binary we should pretty much done,
> we're just giving things a last chance, but in a fully systemd
> compliant system we should have no more left processes when their
> units are stopped.

Well, the whole binary is mostly a safety net anyway, so yes, everythng
should ideally happy cleanly before, but that doesn't relieve us from
coding this safety net correctly.

> if you worry about an unity not being unmounted due processes keeping
> it busy, maybe we can also introduce a non-pid1 killall that SIGTERM
> all processes and that should help dm.

Kay and I have discussed this a bit, and what we came up with is a
scheme like this: there should be a little .service that is active as
long as a user can log in. When it starts up it removes the /etc/nologin
files, and when it shuts down it reccreates it and goes through all
sessions in /sys/fs/cgroups/user/ and kills them. It should be one of
the last services started and the first to be shutdown. It's service
file should probably look something like this:

[Service]
ExecStart=/lib/systemd/systemd-sessions start
ExecStop=/lib/systemd/systemd-sessions stop
RemainAfterExit=yes
Type=oneshot

And then /lib/systemd/systemd-sessions should be a simple binary that
manages /etc/nologin and kills those sessions.

(Oh, and we probably should also managed /var/run/nologin or so, since
/etc/nologin is kinda sucky on r/o root dirs)

> and do we really need to remove the devices themselves? isn't
> unmounting their filesystems enough? In my not-so-experienced point of
> view the data is already on media and removing the loop/crypto/dm
> devices will just free system memory, but we're exiting linux anyway
> (it's pretty much like calling free() before ending a process).

Well, consider a file system on a loop device which is created from a
file on some file system. To unmount the latter fs you need to unmount
the former fs AND remove the loop device, otherwise you will get EBUSY
if you try to unmount the latter fs.

Getting rid of the loop device is actually easy. The equivalent of
"losetup -d" is a LOOP_CLR_FD ioctl call on the loop device. Really
trivial to implement.

> > - As a minor optimization we might want an umount run for all
> > non-API tmp disks before disabling all swaps. Why? Because when removing
> > the swaps everything that is swapped out on it will be swapped back into
> > memory. If this is tmpfs data that is not necessary, hence let's unmount
> > the tmpfs stuff first and then remove the swaps. (the fedora reboot
> > script does this). For the API tmpfs file systems (i.e. /sys/fs/cgroup
> > and /dev this should not be necessary since it does not store any big
> > files).
> 
> so we just do a pre-loop to umount tmpfs, then swap, then the regular
> one?

I'd integrate this into a single loop, to deal with weird stuff such as
fs-on-loop-on-tmpfs-on-swap-on-fs-on-loop-....

> > - having support for kexec would be cool. As last step simply invoke
> > /sbin/kexec -e -x -f, and if that fails fall back to a normal reboot.
> 
> do we really have to exec /sbin/kexec or is it possible to do some
> syscall?

Unless you want to reimplement the complex kernel loader logic
/sbin/kexec is the only solution. 

Lennart

-- 
Lennart Poettering - Red Hat, Inc.