[systemd-devel] Too little information is shown when system enters emergency mode

Zbigniew Jędrzejewski-Szmek zbyszek at in.waw.pl
Mon Oct 22 02:41:42 PDT 2012


On Sun, Oct 21, 2012 at 03:13:22PM +0200, Frederic Crozat wrote:
> Le dimanche 21 octobre 2012 à 15:59 +0400, Andrey Borzenkov a écrit :
> > This issue comes up relatively often on openSUSE forums. Users
> > complaint that when system drops in emergency, there is nothing that
> > would explain user why it happened or what to do. Typical situation is
> > https://bugzilla.novell.com/show_bug.cgi?id=782904.
> > 
> > openSUSE by default is using "splash quiet" kernel parameter. So the
> > first issue is, interpretation of "quite" changed in systemd. Now it
> > means suppress all output of systemd services. As result we have the
> > following (even without boot splash involved) when some device in
> > fstab is missing:
> > 
> > doing fast boot
> > Creating device nodes with udev
> > Waiting for device /dev/root to appear:  ok
> > fsck from util-linux 2.21.2
> > [/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/sda6
> > /dev/sda6: clean, 31805/705744 files, 344231/2819584 blocks
> > fsck succeeded. Mounting root device read-write.
> > Mounting root /dev/root
> > mount -o rw,acl,user_xattr -t ext4 /dev/root /root
> > [   10.706463] piix4_smbus 0000:00:07.3: SMBus base address
> > uninitialized - upgrade BIOS or use force_addr=0xaddr
> > Welcome to emergency mode. Use "systemctl default" or ^D to enter default mode.
> > Give root password for login:
> > 
> > This is literally everything that user sees on console. My first
> > reaction was to add "systemctl --failed" as pre-exec to emergency.
> > Unfortunately:
> > 
> > linux-q652:~ # systemctl --no-pager --failed
> > UNIT LOAD   ACTIVE SUB JOB DESCRIPTION
> > 
> > LOAD   = Reflects whether the unit definition was properly loaded.
> > ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
> > SUB    = The low-level unit activation state, values depend on unit type.
> > JOB    = Pending job for the unit.
> > 
> > 0 units listed. Pass --all to see inactive units, too.
> > 
> > Everything is fine. This is understandable - we are now in different
> > transaction and as far as I understand, systemctl --failed shows only
> > results of currently active transaction (am I right?).
> > 
> > Only when "quiet" is turned off, do I really see something (again -
> > assuming we do not have bootsplash ...)
> > 
> > Started /boot/efi                                                      [  OK  ]
> > Dependency failed. Aborted start of /mnt                               [ ABORT]
> > Dependency failed. Aborted start of Login Service                      [ ABORT]
> > Dependency failed. Aborted start of D-Bus System Message Bus           [ ABORT]
> > Welcome to emergency mode. Use "systemctl default" or ^D to enter default mode.
> > 
> > So right now if anything goes extremely wrong we have baffled user
> > sitting before "emergency mode" prompt and not knowing what to do
> > next. Is it considered a problem by someone else? Would it be feasible
> > to turn off "quiet" and bootsplash immediately after any unit failed
> > during system boot?
> 
> Please note the version of systemd (v44) in openSUSE doesn't have all
> the needed bits to always display on the screen why dependency failed
> (and you end up in emergency mode). This is fixed with systemd 195 which
> should land in Factory pretty soon.
As an experiment, I tried the same (add '/dev/sda9 /mnt' to /etc/fstab)
under v194-138-g20f59e4, i.e. very recent. After rebooting all I see is
the emergency mode prompt.

Now the problem is that 'dev-sda9.device' is loaded & inactive(dead).
This means that it doesn't show up in --failed. So 'systemctl' with
various options doesn't show what failed in an easy to recognize way.

OTOH 'journalctl -b' is immensly useful:
"""<red>
Timed out waiting for device dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device.
Dependency failed for /mnt.
Dependency failed for Local File Systemds.
...
</red>"""

This is great, and it would be really nice to expose it more. I guess that
the first change would be to advertise 'journalctl -b' in the emergency
mode intro.

Would be nice to also un-eescape the device name: "Timed out waiting
for device /dev/sda9" should be much more understandable for the
non-systemd-knowledgable person than "Timed out waiting for device
dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device."

But it would be best to provide a short status like:
"""
systemd was trying to reach target 'default.target'
(which points to 'Multi-User', multi-user.target), but failed,
because device /dev/sda9 is missing (dev-disk-by\x2duuid\x5cx2fdev\x5cx2fsda9.device).
in turn this caused '/mnt' mount to fail (mnt.mount),
in turn this caused 'Local File Systems' target to fail (local-fs.target),
...
in turn this caused 'Multi User' target to fail (multi-user.target).
"""

And a hint how to e.g. temporarily disable the failing mount point. I
admit that I'm not sure what is the proper way, short of editing
/etc/fstab and rebooting.

Would be nice if this output could be easily retrieved again. If the
user starts looking at the system, and then forgets what exactly
failed, he or she should be able to repeat this short diagnosis.

> However, on a more general basis (not openSUSE specific), I think we
> should add some special handly in systemd for a kernel command line
> option (for instance debug or debug=1), which would "expand" into
> "systemd.log_level=debug systemd.log_target=kmsg). This would be much
> easier to tell users when debug is needed and we could also add an
> additional menu entry in bootloader (under the "advanced settings") so
> this setting would be always available, if needed.
Yes, this would be useful, but this would require a reboot, and there
must be an easier way to debug an already failed system. Sometimes the
problem is not easily repeatable.

Zbyszek


More information about the systemd-devel mailing list