[systemd-devel] BUG: several bugs in core/main.c (v218)

Mon Jan 26 16:48:32 PST 2015

On Sun, 25.01.15 03:37, Tomasz Pawlak (tomazzi at wp.pl) wrote:

> core/main.c:1519
>       /* Make sure we leave a core dump without panicing the
>        * kernel. */
>       if (getpid() == 1) {
>               install_crash_handler();
> 
>               r = mount_cgroup_controllers(arg_join_controllers);
>               if (r < 0)
>                       goto finish;
>       }
> 
> core/main.c:226
> static void install_crash_handler(void) {
>       struct sigaction sa = {
>               .sa_handler = crash,
>               .sa_flags = SA_NODEFER,
>       };
> 
>       sigaction_many(&sa, SIGNALS_CRASH_HANDLER, -1);
> }
> 
> /shared/util.c:2207
> int sigaction_many(const struct sigaction *sa, ...) {
>       va_list ap;
>       int r = 0, sig;
> 
>       va_start(ap, sa);
>       while ((sig = va_arg(ap, int)) > 0)
>               if (sigaction(sig, sa, NULL) < 0)
>                       r = -errno;
>       va_end(ap);
> 
>       return r;
> }
> 
> shared/def.h:40
> #define SIGNALS_CRASH_HANDLER SIGSEGV,SIGILL,SIGFPE,SIGBUS,SIGQUIT,SIGABRT
> 
> 
> BUGS:
> 1. Ignoring return value from sigaction_many(): all sig handlers can
> silently fail to install, thus leaving the whole process unprotected

Well, if the kernel, policy, valgrind, some arch or whatever else
disallows us to install a crash handler for one of the signals, than
we won't have one on it, but I don't think that's a problem. We do the
best we can regarding setting up crash handling and if we can't we
just proceed. I mean, if your system crashes then you are fucked
anyway - the crash handler is something to makes it slighlty less
annyoing, but essentially you are still fucked, there's no way around
it.

Hence, I think ignoring the result of sigaction_many() is really the
best thing we can do: we make the best of the situation, for a
peripheral feature.

> 2. SIGSEGV should be catched by a handler running on a separate
> stack (SA_ONSTACK) - otherwise it can cause segfault itself, when
> the first SIGSEGV which triggered the handler is caused by stack
> overflow.

I'd take a patch for this, but in general: the crash handler is
nothing that is supposed to recover your system. All it does its close
all fds, fork, and coredump in the child while freezing in the
parent. That makes sure that all communication with PID 1 is severed
(so that clients talking to it don't hang), we get a clean coredump,
but the kernel doesn't oops immediately. That's really all there is to
it.

So far we never ran into stack overflow issues, hence nobody was ever
tempted to cover that case too, and set up a separate stack. In
particular since doing this without testing it is pointless...

> 3. SA_NODEFER makes no sense, since the handler is expected to catch
> only first critical signal. With SA_NODEFER nesting of signals is
> possible, what can cause unpredictable results, including uncatched
> stack overflow caused by the handler itself.

We set SA_NODEFER so that the signal handler can raise the signal
again in the child process so that we get clean coredump for the logs.

Anyway, given that this all is not obvious to see from the sources I
have now added a couple of comments there.

Lennart

-- 
Lennart Poettering, Red Hat