[systemd-devel] Troubleshooting Failed Nspawn Starts

Rich Freeman r-systemd at thefreemanclan.net
Fri Aug 14 11:23:40 PDT 2015


On Fri, Aug 14, 2015 at 1:30 PM, Lennart Poettering
<mzerqung at 0pointer.de> wrote:
> On Mon, 10.08.15 08:03, Rich Freeman (r-systemd at thefreemanclan.net) wrote:
>
> We have watchdog (see WatchdogSec= documentation in
> systemd.service(5)) support in all our long-running daemons, and PID 1
> will kill the service and generate a backtrace for them if they don't
> send a watchdog message often enough. So actually we should be pretty
> good here...

Thanks.  In this case I'm not sure if it is needed more for nspawn
itself, or for systemd (which probably won't work unless nspawn
supports watchdog), or for journald/etc.

>
>> Example of a frozen container:
>>
>> systemctl status mariadb-contain
>> ● mariadb-contain.service - mariadb container
>>    Loaded: loaded (/etc/systemd/system/mariadb-contain.service;
>> enabled; vendor preset: enabled)
>>    Active: active (running) since Mon 2015-08-10 07:21:48 EDT; 37min ago
>>      Docs: man:systemd-nspawn(1)
>>  Main PID: 1033 (systemd-nspawn)
>>    Status: "Container running."
>>    CGroup: /system.slice/mariadb-contain.service
>>            ├─1033 /usr/bin/systemd-nspawn --quiet --keep-unit --boot
>> --link-journal=guest --directory=/sstorage3/cont...
>>            ├─1044 /usr/lib/systemd/systemd
>>            └─system.slice
>>              ├─systemd-journald.service
>>              │ └─1407 /usr/lib/systemd/systemd-journald
>>              └─systemd-journal-flush.service
>>                └─1340 /usr/bin/journalctl --flush
>
> Hmm, this is really weird... Would be good to get a backtrac of both
> journald and journalctl here. Note that journald has a much higher PID
> that journalctl though, which indicates that it might have gotten
> restarted by systemd already...

I'll look to get one.

>
> journalctl --flush actually pretty much only sends SIGUSR1 to
> journald, but does this through PID1's bus APIs... It then waits for a
> file in /run/systemd/journal/flushed to appear... For some reason that
> doesn't work here... Weird...

I'm actually wondering if it is some kind of dbus api issue.  I don't
have anything in this email but I seem to recall seeing some error in
a situation like this that mentioned dbus.

>
> Anyway, before tracking this down further, could you update to a more
> recent systemd version?

That's fair to ask.  I'll see about doing just that.  Perhaps it will
resolve the issue as a bonus.  I've been seeing this for a while
though.

The other issue I see sometimes is restarting an nspawn container with
bridged ethernet and having it fail with an error that the interface
is already in use.  After I update I'll see if I can get more info on
that (though in that case everything terminates).

--
Rich


More information about the systemd-devel mailing list