[systemd-devel] RFC: idea for a pstore systemd service

Eric DeVolder eric.devolder at oracle.com
Tue Jan 15 17:23:16 UTC 2019


Systemd-devel,

Below is a write-up I've done to explain a new service for archiving 
pstore contents. I've attached the pstore.service files 
(/lib/systemd/system/pstore.service and bin/pstore-tool). These are 
trivial right now, but easy to build upon if periodic, rather than just 
on-boot, examination of the pstore is desirable.

The questions I have for you are:

- Is a new unit pstore.service the right approach for this? If not, what 
unit do you recommend augmenting with these actions?

- What are your thoughts/comments/feedback on such a service?

Thank you in advance for your time,
Eric

==== Oracle ERST usage ====
The BIOS ACPI error record serialization table, ERST, is an API for 
storing data into non-volatile storage, such as hardware errors [1, 
Section 18.5 Error Serialization]. The ERST non-volatile storage on 
Oracle servers tends to be small, on the order of 64KiB.

The Linux persistent storage subsystem, pstore, supports using the ERST 
as a backend for persistent storage [2].

The kernel, with the crash_kexec_post_notifiers command line option, 
stores the dmesg into pstore on a panic [3]. This action is available 
independent of kdump; as such, the crash backtrace is captured into 
pstore for post mortem analysis, regardless of whether kdump is enabled 
or working properly.

Since the ERST area is typically small, it is easily filled with the 
contents of dmesg upon a kernel panic. As such, there is a need to 
archive the contents of kernel dmesg items in the pstore to a normal 
filesystem, and then free the dmesg items in the pstore in order to make 
room for the dmesg of a subsequent kernel panic.

Therefore, this is a proposal for a new service, pstore.service, that 
will archive the dmesg contents in the pstore to a regular filesystem, 
and remove those dmesg entries from the pstore.  Since Linux exposes the 
persistent storage subsystem as a filesystem [2], and the items in the 
pstore are available as regular files, this makes archiving and removal 
of the entries trivial. This proposal is for a new service instead of 
augmenting kdump.service since this is independent of kdump, though both 
are related to a kernel crash. Conceivably other items that are stored 
in pstore, like hardware errors, could have their own rules for 
archiving. The goal of the pstore.service is to attempt to keep the 
pstore empty and available for emergent events like hardware errors and 
kernel crashes.

Initially the service could be as simple as looking for items upon boot, 
but I could see it being extended to periodically check for events like 
hardware errors in the pstore. Kernel crash dmesg items are named in a 
regular fashion, such as:

-r--r--r-- 1 root root 17716 Nov 20 11:08 dmesg-erst-6625975467788730369
-r--r--r-- 1 root root 17731 Nov 20 11:08 dmesg-erst-6625975467788730370
-r--r--r-- 1 root root 17679 Nov 20 11:08 dmesg-erst-6625975467788730371

And a simple bit of filename manipulation can be used to create archive 
sub-directories, say in /var/pstore, with the archived data.

[1] "Advanced Configuration and Power Interface Specification",
      version 6.2, May 2017.
      https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf

[2] "Persistent storage for a kernel's dying breath",
      March 23, 2011.
      https://lwn.net/Articles/434821/

[3] "The kernel’s command-line parameters",
      https://static.lwn.net/kerneldoc/admin-guide/kernel-parameters.html

-------------- next part --------------
[Unit]
Description=pstore archive service
Wants=network-online.target local-fs.target remote-fs.target
After=network-online.target

[Service]
Type=oneshot
StandardOutput=syslog+console
#EnvironmentFile=/etc/default/kdump-tools
#ExecStart=/etc/init.d/pstore-tools start
#ExecStop=/etc/init.d/pstore-tools stop
ExecStart=/root/pstore-tool start
ExecStop=/root/pstore-tool stop
#RemainAfterExit=yes
RemainAfterExit=no

[Install]
#WantedBy=multi-user.target
WantedBy=local-fs.target

-------------- next part --------------
#!/bin/sh
# Utility script to archive contents of pstore

#-r--r--r--. 1 root root 1826 Dec 17 10:44 dmesg-efi-154506148323001
#-r--r--r--. 1 root root 1826 Dec 17 10:44 dmesg-efi-154506148324001

pstorefs=/sys/fs/pstore
archivedir=/var/pstore/`date +"%Y-%m-%d-%H:%M"`

pstore_start()
{
    echo "PSTORE manager started wtf"
    # Note: The -r is essential for dmesg reconstruction
    files=`ls -r $pstorefs/dmesg-* 2>/dev/null`
    if [ "$files" != "" ];
    then
        # Archive files
        mkdir -p $archivedir
        for f in $files;
        do
            # Reconstruct dmesg
            cat $f >> $archivedir/dmesg.txt
            mv -f $f $archivedir
        done
    fi
}

pstore_stop()
{
    echo "PSTORE manager stopped"
}

while [[ $# -gt 0 ]]
do
    case $1 in
        start)
            pstore_start
            ;;
        stop)
            pstore_stop
            ;;
        *)
            echo "pstore-tool: unrecognized option: $1"
            ;;
    esac
    shift # on to next argument
done



More information about the systemd-devel mailing list