[systemd-devel] Odp: Re: BUG: several bugs in core/main.c (v218)

Mon Jan 26 17:06:19 PST 2015

On Mon, 26.01.15 00:33, Tomasz Pawlak (tomazzi at wp.pl) wrote:

> You are right, but it's not as simple as it may look at first sight:
> 
> 1. If we allow the process to continue without sig handlers
> installed, then results can be just catastrophic: kernel panic with
> all the services launched -> broken transanctions, half-written
> records/files, etc -> total mess, corrupted or lost data etc.  So,
> since successfull installation of the sig handlers is one of the
> most critical parts of initialisation, it is actualy safer to just
> quit. This is just a critical fault (and is currently completely
> ignored).

Hmm? no. if PID 1 dies then either the kernel halts PID 1 or we do. 

> 2. Another thing is, that those signals are not equivalently
> important, f.e. SIGABRT can be throwed by thousants lines of code in
> this project (by abort()), so it is much more likely that assertion
> checking will prevent segfaults, throwing SIGABRT instead. This
> means that SIGABRT is actually far more probable than SIGSEGV.  This
> in turn leads to simple solution: the process should unconditionally
> exit if hander for SIGABRT have failed to install, but with other
> sig handlers failed, we may take a risk and continue.  In any case,
> such situation should be logged as soon as possible.  Ignoring this
> is just asking for catastrophe.

The only thing you can do to recover from SIGABRT or SIGSEGV,
reexec()ing yourself from the sig handler. That' something the kernel
doesn't allow for PID 1 however...

It's illusionary to believe that you could just do some magic, and
return from SIGSEGV and continue running your program. You
cannot. SIGSEGV is more often than not an indication for a memory
corruption, and if that happens, there's no way to bring back the
memory to a state where things are good again, because memory doesn't
tell you if its in a good or bad state.

> 3. SIGFPE: how often the code uses FPU? -> I mean, that handler for
> this sig can be dynamically installed/unistalled when needed,
> probably only on a thread level, not for the whole process. This
> will allow to completely safely report failed sigaction by assertion
> checking.

SIGFPE is also triggered by integer divisions by zero (yeah, the name
is misleading). 

Catching SIGFPE, SIGSEGV, SIGABRT and so on are for software problems
that we don't exptect. If we expected them then we could certainly
handle them in a nicer way than getting a signal thrown...

> 4. So, sigaction_many() should be removed (also because it is a
> vararg function, what is rather bad idea), and a function for

Ahum? vararg is bad now? I must have missed that memo. Why would it be
bad? Do you write C code without printf() (which is varargs)?

> registering one sig handler at a time should be used. Then, we can
> tell (log) which signals were not registered by sigaction, and take
> conscious decision what to do next.

We actually want to handle failure of installing these crash handlers
all the same way: by mostly ignoring them, and proceeding.

Lennart

-- 
Lennart Poettering, Red Hat