[systemd-devel] systemd unexpectedly dropping into rescue mode - how do I best debug this?
Ingo Molnar
mingo at kernel.org
Thu Oct 4 03:54:11 PDT 2012
* Kay Sievers <kay at vrfy.org> wrote:
> On Thu, Oct 4, 2012 at 12:05 PM, Ingo Molnar <mingo at kernel.org> wrote:
> >
> > * Tom Gundersen <teg at jklm.no> wrote:
> >
> >> Hi Ingo,
> >>
> >> On Thu, Oct 4, 2012 at 11:12 AM, Ingo Molnar <mingo at kernel.org> wrote:
> >> > I'm wondering how to debug the following systemd problem:
> >>
> >> [...]
> >>
> >> > Here are the units that are showing some sort of error:
> >> >
> >> > lyra:~> systemctl --all | grep -i err
> >> > exim.service error inactive dead exim.service
> >> > iscsi.service error inactive dead iscsi.service
> >> > iscsid.service error inactive dead iscsid.service
> >> > livesys-late.service error inactive dead livesys-late.service
> >> > livesys.service error inactive dead livesys.service
> >> > named.service error inactive dead named.service
> >> > postfix.service error inactive dead postfix.service
> >> > remount-rootfs.service error inactive dead remount-rootfs.service
> >> > ypserv.service error inactive dead ypserv.service
> >>
> >> You might get useful information from:
> >>
> >> # systemctl status remount-rootfs.service
> >>
> >> (and similarly for the other ones).
> >
> > Querying those gives me the following uninformative output:
>
> > Trying to start it again gives:
> >
> > [root at lyra ~]# systemctl start remount-rootfs.service
> > Failed to issue method call: Unit remount-rootfs.service failed
> > to load: No such file or directory. See system logs and
> > 'systemctl status remount-rootfs.service' for details.
>
> These are just units where other units try to order against, but which
> are not available. It's nothing wrong here, besides the misguiding
> output, which we should think about what to tell instead.
>
>
> Could you add:
> systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M
> to the kernel command line? It will print all userspace log to the
> kernel buffer, which is the most reliable way to store the logs when
> stuff goes wrong that early during bootup.
>
> It should tell us more what's going on, and why you end up in the rescue shell.
Sure, will try this straight away.
> This is the wiki page about debugging:
> http://www.freedesktop.org/wiki/Software/systemd/Debugging
Yeah, I've seen that - but I kind of expected that the bootup
not going well is one of the most frequent failure cases for
systemd - and I expected that once such a common failure mode
happens all the info is there to recover and find the reason for
the failure.
Having to reboot the system really destroys failure state, and
it's a lucky circumstance that I can reproduce the failure and
can give you debug output.
So I'm kind of surprised that I have to reboot the system to
debug it further and that all the available error output is so
unhelpful. With such a scheme wow do you debug spurious
dependency/bootup bugs, or non-deterministic parallel bootup
bugs/races, if it's not possible to get in situ data from
production systems?
So for example, do we know from the info I've provided why
systemd went into rescue mode? I suspect some target failed -
can I list what targets filed for the default.target
(multi-user.target) and re-try to load them?
I'm just trying to figure out how to analyze this best without
having to reboot the system.
My best guess is that something is missing from the kernel I've
booted - but it's strange that I'm unable to analyze/debug it
from the failed, rescue state itself.
If I Ctrl-D the rescue shell it just reaches the multi-user
target just fine. Is that expected?
Thanks,
Ingo
More information about the systemd-devel
mailing list