[systemd-devel] Shutdown/reboot problem: root hd powered off too early?

Wed Nov 23 10:01:37 UTC 2016

Hi,

I have a question concerning the shutdown/reboot phase that might be a
problem with the kernel, but I was told in the Arch Linux forums to ask
here too, as I have also problems with correctly logging the relevant
information.  I try to be as specific as I can, but I have to apologize
in advance: I'm no low-level Linux and/or systemd expert.

I'm running Arch Linux on a second disk that I have installed in my
MacBookPro, in place for the broken optical drive [*].  This hd contains
my root and /home file systems (only /boot is on the first hd, alongside
macOS).  Since the kernel update to version 4.8, when shutting down the
machine this second disk seems to be powered off too early.  I can
clearly hear a 'clank!' sound, and I'm left with nothing but the
following lines on the screen afterwards (transcribed from a photo from
the screen taken with my phone):

ata2: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe
frozenbd204ed5c
ata2: irq_stat 0x00400000, PHY RDY changed
ata2: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }

After a few seconds, the system shuts down or reboots nevertheless.
This issue is related to the kernel version.  When running an older
version like 4.4 (Arch's current LTS version), I don't hear the 'clank'
(and I don't see the ata2 error messages).  It first appeared with
kernel 4.8.  Version 4.7, 4.6 and 4.5 did not have this issue either.

I tried to follow the systemd docs
(https://freedesktop.org/wiki/Software/systemd/Debugging/#index2h1) for
logging information during late shutdown, but I had no luck (probably
since the root file system is already dead when the script tries to
remount it in order to store the dmesg output).

Instead, I used journalctl -b -1 -nall > kernel4x-log.txt to generate
two log files, one using kernel 4.4, and one using kernel 4.8 (both with
system.log_level=debug system.log_target=kmsg log_bug_len=1M enforcing=0
as kernel parameters):

https://dl.dropboxusercontent.com/u/36715290/kernel44-log.txt
https://dl.dropboxusercontent.com/u/36715290/kernel48-log.txt

However, after skimming through these logs, it seems to me that there
are problems even on kernel 4.4.  Towards the end, both files contain
lots of lines like:

"Failed to send unit remove signal for XYZ.{target,socket,unit}:
Transport endpoint is not connected"

>From this it appears to me that I might have been using a wrongly
configured setup from the beginning, but only having noticed it because
of the loud mechanical sound and the more obvious error messages since 4.8.

So, my question is: is this a kernel regression in 4.8 that should be
reported to the kernel devs?  Or is just a misconfiguration on my side?
If the latter is true, what can I do to tell systemd and/or the kernel
to keep the second hd powered until the very end?

Here's the link to the Arch Linux Forum Thread where I first asked for
help: https://bbs.archlinux.org/viewtopic.php?pid=1668277

Thanks in advance, and once again sorry for my non-expert tone.

[*] Here's a link to the hd caddy I'm using:
https://www.amazon.com/dp/B0090KFOYS

Best,
Johannes