[systemd-devel] RFC: idea for a pstore systemd service
Eric DeVolder
eric.devolder at oracle.com
Tue Jan 15 17:23:16 UTC 2019
Systemd-devel,
Below is a write-up I've done to explain a new service for archiving
pstore contents. I've attached the pstore.service files
(/lib/systemd/system/pstore.service and bin/pstore-tool). These are
trivial right now, but easy to build upon if periodic, rather than just
on-boot, examination of the pstore is desirable.
The questions I have for you are:
- Is a new unit pstore.service the right approach for this? If not, what
unit do you recommend augmenting with these actions?
- What are your thoughts/comments/feedback on such a service?
Thank you in advance for your time,
Eric
==== Oracle ERST usage ====
The BIOS ACPI error record serialization table, ERST, is an API for
storing data into non-volatile storage, such as hardware errors [1,
Section 18.5 Error Serialization]. The ERST non-volatile storage on
Oracle servers tends to be small, on the order of 64KiB.
The Linux persistent storage subsystem, pstore, supports using the ERST
as a backend for persistent storage [2].
The kernel, with the crash_kexec_post_notifiers command line option,
stores the dmesg into pstore on a panic [3]. This action is available
independent of kdump; as such, the crash backtrace is captured into
pstore for post mortem analysis, regardless of whether kdump is enabled
or working properly.
Since the ERST area is typically small, it is easily filled with the
contents of dmesg upon a kernel panic. As such, there is a need to
archive the contents of kernel dmesg items in the pstore to a normal
filesystem, and then free the dmesg items in the pstore in order to make
room for the dmesg of a subsequent kernel panic.
Therefore, this is a proposal for a new service, pstore.service, that
will archive the dmesg contents in the pstore to a regular filesystem,
and remove those dmesg entries from the pstore. Since Linux exposes the
persistent storage subsystem as a filesystem [2], and the items in the
pstore are available as regular files, this makes archiving and removal
of the entries trivial. This proposal is for a new service instead of
augmenting kdump.service since this is independent of kdump, though both
are related to a kernel crash. Conceivably other items that are stored
in pstore, like hardware errors, could have their own rules for
archiving. The goal of the pstore.service is to attempt to keep the
pstore empty and available for emergent events like hardware errors and
kernel crashes.
Initially the service could be as simple as looking for items upon boot,
but I could see it being extended to periodically check for events like
hardware errors in the pstore. Kernel crash dmesg items are named in a
regular fashion, such as:
-r--r--r-- 1 root root 17716 Nov 20 11:08 dmesg-erst-6625975467788730369
-r--r--r-- 1 root root 17731 Nov 20 11:08 dmesg-erst-6625975467788730370
-r--r--r-- 1 root root 17679 Nov 20 11:08 dmesg-erst-6625975467788730371
And a simple bit of filename manipulation can be used to create archive
sub-directories, say in /var/pstore, with the archived data.
[1] "Advanced Configuration and Power Interface Specification",
version 6.2, May 2017.
https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
[2] "Persistent storage for a kernel's dying breath",
March 23, 2011.
https://lwn.net/Articles/434821/
[3] "The kernel’s command-line parameters",
https://static.lwn.net/kerneldoc/admin-guide/kernel-parameters.html
-------------- next part --------------
[Unit]
Description=pstore archive service
Wants=network-online.target local-fs.target remote-fs.target
After=network-online.target
[Service]
Type=oneshot
StandardOutput=syslog+console
#EnvironmentFile=/etc/default/kdump-tools
#ExecStart=/etc/init.d/pstore-tools start
#ExecStop=/etc/init.d/pstore-tools stop
ExecStart=/root/pstore-tool start
ExecStop=/root/pstore-tool stop
#RemainAfterExit=yes
RemainAfterExit=no
[Install]
#WantedBy=multi-user.target
WantedBy=local-fs.target
-------------- next part --------------
#!/bin/sh
# Utility script to archive contents of pstore
#-r--r--r--. 1 root root 1826 Dec 17 10:44 dmesg-efi-154506148323001
#-r--r--r--. 1 root root 1826 Dec 17 10:44 dmesg-efi-154506148324001
pstorefs=/sys/fs/pstore
archivedir=/var/pstore/`date +"%Y-%m-%d-%H:%M"`
pstore_start()
{
echo "PSTORE manager started wtf"
# Note: The -r is essential for dmesg reconstruction
files=`ls -r $pstorefs/dmesg-* 2>/dev/null`
if [ "$files" != "" ];
then
# Archive files
mkdir -p $archivedir
for f in $files;
do
# Reconstruct dmesg
cat $f >> $archivedir/dmesg.txt
mv -f $f $archivedir
done
fi
}
pstore_stop()
{
echo "PSTORE manager stopped"
}
while [[ $# -gt 0 ]]
do
case $1 in
start)
pstore_start
;;
stop)
pstore_stop
;;
*)
echo "pstore-tool: unrecognized option: $1"
;;
esac
shift # on to next argument
done
More information about the systemd-devel
mailing list