[systemd-devel] Too little information is shown when system enters emergency mode

Tue Oct 23 08:09:10 PDT 2012

On Mon, 22.10.12 11:41, Zbigniew Jędrzejewski-Szmek (zbyszek at in.waw.pl) wrote:

> > Please note the version of systemd (v44) in openSUSE doesn't have all
> > the needed bits to always display on the screen why dependency failed
> > (and you end up in emergency mode). This is fixed with systemd 195 which
> > should land in Factory pretty soon.
> As an experiment, I tried the same (add '/dev/sda9 /mnt' to /etc/fstab)
> under v194-138-g20f59e4, i.e. very recent. After rebooting all I see is
> the emergency mode prompt.
> 
> Now the problem is that 'dev-sda9.device' is loaded & inactive(dead).
> This means that it doesn't show up in --failed. So 'systemctl' with
> various options doesn't show what failed in an easy to recognize way.
> 
> OTOH 'journalctl -b' is immensly useful:
> """<red>
> Timed out waiting for device dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device.
> Dependency failed for /mnt.
> Dependency failed for Local File Systemds.
> ...
> </red>"""
> 
> This is great, and it would be really nice to expose it more. I guess that
> the first change would be to advertise 'journalctl -b' in the emergency
> mode intro.

I added this yesterday, it is included in 195.

> Would be nice to also un-eescape the device name: "Timed out waiting
> for device /dev/sda9" should be much more understandable for the
> non-systemd-knowledgable person than "Timed out waiting for device
> dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device."

This is a cool idea. Adding the "inverse" of unit_name_mangle(), and
showing that if we have a failure on a unit with no sane description
string and sounds like an awesome idea (though we probably should show
the mangled name as well, dunno, might be useful for proficient folks).

The fix might actually be simple, we could transparently do this in
unit_description() on access.

Added this to the TODO list.

> But it would be best to provide a short status like:
> """
> systemd was trying to reach target 'default.target'
> (which points to 'Multi-User', multi-user.target), but failed,
> because device /dev/sda9 is missing (dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device).
> in turn this caused '/mnt' mount to fail (mnt.mount),
> in turn this caused 'Local File Systems' target to fail (local-fs.target),
> ...
> in turn this caused 'Multi User' target to fail (multi-user.target).
> """

Uh, This is quite hard to do, since we don't track the reasons so
much. I wonder if a simple list of failure and failure-due-to-dep
wouldn't be sufficient, rather than prose here...

> And a hint how to e.g. temporarily disable the failing mount point. I
> admit that I'm not sure what is the proper way, short of editing
> /etc/fstab and rebooting.

I wonder if this is something to handle with the "explanation" database
(aka "message catalogue") I want to add to the journal. This would
optionally augment log entries with static info from the vendor about
the issue, with longer help, links and support contact. All this would
be keyed off the message ID of a message, and be translated to the local
language of the user. My idea is to expose this with "journalctl -e" or
so, where every log line gets this data attched to it, in a block below
each line, where it is available.

Using the explanation database is a great way to handle this and more
errors and get translation and links for free. 

> Would be nice if this output could be easily retrieved again. If the
> user starts looking at the system, and then forgets what exactly
> failed, he or she should be able to repeat this short diagnosis.

Sounds like a job for "journalct -ebp err" or so, if we have the
explanation database?

Lennart

-- 
Lennart Poettering - Red Hat, Inc.