[systemd-devel] systemd and smartd daemon

Lennart Poettering lennart at poettering.net
Thu Apr 2 11:30:59 PDT 2015


On Thu, 02.04.15 10:43, Al Lau (lauatic at gmail.com) wrote:

> Hi,
> 
> I am seeking help and advise on troubleshooting the starting of the smartd
> daemon.  The /usr/sbin/smartd comes from the smartmontools version 6.3.
> 
> The /usr/lib/systemd/system/smartd.service file looks like this
> 
> # cat /usr/lib/systemd/system/smartd.service
> [Unit]
> Description=Self Monitoring and Reporting Technology (SMART) Daemon
> Documentation=man:smartd(8) man:smartd.conf(5)
> After=syslog.target

This line is not necessary. Nowadays all services are started with
/dev/log available. Please remove.

> 
> [Service]
> Type=forking
> PIDFile=/run/smartd.pid
> ExecStartPre=/bin/rm -f /run/smartd.pid

systemd removes PID files listed in PIDFile= automatically these days,
this line is unnecessary.

> EnvironmentFile=/etc/sysconfig/smartmontools
> ExecStart=/usr/sbin/smartd $smartd_opts

I assume that your $smartd_opts includes any necessary switches to
tell smartd to daemonize?

> ExecReload=/bin/kill -HUP $MAINPID
> StandardOutput=syslog

This line is not necessary, systemd since quite some time connects all
daemon stdout/stderr to the journal anyway. 

> [Install]
> WantedBy=multi-user.target
> 
> When "systemctl start smartd.service" is called, the process forks into a
> daemon.  The problem I'm seeing is that the forked process received a
> SIGTERM signal and exited.  How do I resolve this so that the forked
> process would not get terminated.
> 
> # systemctl status smartd.service
> smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
>    Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled)
>    Active: failed (Result: timeout) since Thu 2015-04-02 17:19:49 GMT; 4min
> 39s ago
>      Docs: man:smartd(8)
>            man:smartd.conf(5)
>   Process: 7456 ExecStart=/usr/sbin/smartd $smartd_opts (code=exited,
> status=0/SUCCESS)
>   Process: 7454 ExecStartPre=/bin/rm -f /run/smartd.pid (code=exited,
> status=0/SUCCESS)
>  Main PID: 1075 (code=exited, status=0/SUCCESS)
> 
> Apr 02 17:18:20 007 smartd[7456]: Device: /dev/bus/0 [megaraid_disk_19],
> opened
> Apr 02 17:18:20 007 smartd[7456]: Device: /dev/bus/0 [megaraid_disk_19],
> [TOSHIBA  MG03SCA300       5702], lu ....00 TB
> Apr 02 17:18:20 007 smartd[7456]: Device: /dev/bus/0 [megaraid_disk_19], is
> SMART capable. Adding to "monitor" list.
> Apr 02 17:18:20 007 smartd[7456]: Monitoring 0 ATA and 12 SCSI devices
> Apr 02 17:18:21 007 systemd[1]: PID file /run/smartd.pid not readable
> (yet?) after start.
> Apr 02 17:18:21 007 smartd[7463]: smartd has fork()ed into background mode.
> New PID=7463.
> Apr 02 17:19:49 007 systemd[1]: smartd.service operation timed out.
> Terminating.
> Apr 02 17:19:49 007 smartd[7463]: smartd received signal 15: Terminated
> Apr 02 17:19:49 007 systemd[1]: Failed to start Self Monitoring and
> Reporting Technology (SMART) Daemon.
> Apr 02 17:19:49 007 systemd[1]: Unit smartd.service entered failed state.
> Hint: Some lines were ellipsized, use -l to show in full.
> #
> 
> To verify, I take the "/usr/sbin/smartd $smartd_opts" and run it from the
> command line.  The smartd daemon forks and the daemon process stays up as
> expected.

The output above really suggests that the daemonization code in smartd
is broken?

A sysv daemon should do the following when daemonizing:

1) fork twice, exit in the middle child
2) in the grandchild (the main daemon process), set everything up,
   including writing the PID file
3) exit in the parent

It is important that 3) does not happen before 2) is complete and the
PID file exists. The output above suggests that smartd doesn't get
this right, and sometimes writes the PID file after exiting in the parent.

This is a bug that needs to be fixed in smartd, and it is not specific
to systemd's way of watching daemons. With this broken things like
this on a pure sysv system are racy too:

     /etc/init.d/smartd start ; /etc/init.d/smartd stop

(because the PID file might not exist yet when the stop command tries
to read it to kill the daemon).

Also see daemon(7) for a more detailed overview.

Anyway, this really smells like a daemon bug, not a bug in your unit
file, and it needs to be fixed in the daemon.

(An alternative is to not let the daemon daemonize, but tell it to
stay in the foreground and then use the default Type=simple. However,
this means that systemd will not wait for the daemon to be fully
initialized, which might or might not be an issue, depending on the
daemon, for example whether it needs to be accessible via IPC or so)

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list