[systemd-devel] Odd status after core dump

Lennart Poettering lennart at poettering.net
Fri Jul 31 19:57:37 UTC 2020


On Fr, 31.07.20 11:52, Ulrich Windl (Ulrich.Windl at rz.uni-regensburg.de) wrote:

> Hi!
>
> For systemd-228-157.12.5.x86_64 (SLES12 SP5) I had an odd status:
> Three services were re-started in parallel, and one of those failed (stop
> routine killed itself with signal 6 due to a timeout):
>
> Now the status systemctl displays looks like this (some details omitted):
>iotwatch at NFS1.service - iotwatch I/O performance monitor instance "NFS1"
>    Loaded: loaded (/etc/iotwatch.conf; enabled; vendor preset: disabled)
>    Active: active (running) since Fri 2020-07-31 11:31:57 CEST; 6min ago
> ...
>  Main PID: 12246 (iotwatch-NFS1)
>     Tasks: 4 (limit: 512)
>    CGroup: /system.slice/system-iotwatch.slice/iotwatch at NFS1.service
>            └─12246 iotwatch-NFS1 -l /var/log/iotwatch/NFS1/iotwatch-NFS1.log
> ...
>
> Jul 31 11:31:57 v04 systemd[1]: Starting iotwatch I/O performance moni.....
> Jul 31 11:31:57 v04 systemd[1]: Started iotwatch I/O performance monit...".
> Jul 31 11:31:59 v04 systemd-coredump[12190]: Process 7123 (iotwatch-NFS1...
>
>                                                   Stack trace of thread 7123:
> Hint: Some lines were ellipsized, use -l to show in full.
>
> Now the odd thing is the sequence of journal entries:
> The service started and then dumped core, but still is active?
>
> Actually what happened was: A client told the server process to stop, but when
> it didn't in time, it killed it with signal 6. So the old process stopped.
> THEN the new process was started (PID 12246).
>
> Is the ordering due to the fact that coredump completed after the new service
> started? Or is it just systemd's archiving of the core dump file? I thought the
> pid is visible until core-dump had completed....

When a coredump happens we'll collect it and then process it,
i.e. compress it, generate stacktrace of it, and finally when we are
done with all that report it to the journal. Hence it might be that
the coredump processing finishes long after the service entered
another cycle. In fact, we nice the coredump processing to nice level
9 since doing the stacktrace analysis, the compression and writing
everyting to disk might be quite CPU and IO intensive, hence we
priorize it down in order not to affect scheduling of everything else
as negatively. This however means that if CPU/IO is scarce the
coredump processing might be delayed quite a bit.

Lennart

--
Lennart Poettering, Berlin


More information about the systemd-devel mailing list