[systemd-devel] Health check for a service managed by systemd

Reindl Harald h.reindl at thelounge.net
Fri Jul 26 13:59:26 UTC 2019



Am 26.07.19 um 15:37 schrieb Debraj Manna:
> Thanks Reindl for replying. 
> 
> Can we make use of the watchdog & systemd-notify functionality of
> systemd? I mean something like this. 

probably you can but i doubt you gain anything

you just increase complexity with additional points of errors and i
don't see the point when you have to use curl in both cases where the
difference to just call "systemctl condrestart" is

write a nice logline with 'logger' and the same wording as a failed
service, i do that because a cronjob collects all that events systemwide
to trigger cron mails

don't forget 'condrestart' because it's not funny when you stop a
service by purpose and some monitoring fires it up unasked, been there
with mysqld 10 years ago.....

that below is part of my httpd rpm and works like a charme for years and
"$max_fail_count = 3" with the sleep is important in the real world
because when you are under load and temporary out of workers it's not
funny when some "crap" restarts the webserver and reset the PHP bytecode
cache all the time

---------------------------------------------------------------------

[root at testserver:~]$ systemctl status monitor-httpd
● monitor-httpd.service - Monitor/Restart Webserver
   Loaded: loaded (/usr/lib/systemd/system/monitor-httpd.service;
enabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-07-25 04:48:57 CEST; 1 day
11h ago
 Main PID: 821 (php)
    Tasks: 1 (limit: 512)
   Memory: 3.7M
   CGroup: /system.slice/monitor-httpd.service
           └─821 /usr/bin/php -n -d display_errors=1 -d
display_startup_errors=1 /usr/bin/monitor-httpd.php
https://rhsoft.testserver.rhsoft.net/robots.txt

---------------------------------------------------------------------

[root at testserver:~]$ cat /usr/lib/systemd/system/monitor-httpd.service
[Unit]
Description=Monitor/Restart Webserver
After=httpd.service network-online.target
Requires=network-online.target
ConditionPathExists=/etc/sysconfig/monitor-httpd
ConditionPathExists=/usr/bin/monitor-httpd.php
ConditionPathExists=/usr/bin/php

[Service]
Type=simple
EnvironmentFile=/etc/sysconfig/monitor-httpd
ExecStart=/usr/bin/php -n -d display_errors=1 -d
display_startup_errors=1 /usr/bin/monitor-httpd.php $MONITOR_URL

Restart=always
RestartSec=5
TimeoutSec=5

User=root
Group=root

CapabilityBoundingSet=CAP_KILL
MemoryDenyWriteExecute=yes
NoNewPrivileges=yes
PrivateDevices=yes
PrivateTmp=yes
ProtectControlGroups=yes
ProtectHome=yes
ProtectKernelModules=yes
ProtectKernelTunables=yes
ProtectSystem=strict

[Install]
WantedBy=multi-user.target

---------------------------------------------------------------------

[root at testserver:~]$ cat /etc/sysconfig/monitor-httpd
MONITOR_URL=https://rhsoft.testserver.rhsoft.net/robots.txt

---------------------------------------------------------------------

[root at testserver:~]$ cat /usr/bin/monitor-httpd.php
#!/usr/bin/php
<?php declare(strict_types=1);
/** make sure we are running as shell-script */
if(PHP_SAPI !== 'cli')
{
 exit("FORBIDDEN\n");
}

/** we need at test-url as param */
if(empty($_SERVER['argv'][1]))
{
 exit("USAGE: monitor-httpd.php <URL>\n");
}

/** do not verify certificates */
stream_context_set_default(['ssl'=>['verify_peer'=>FALSE,
'verify_peer_name'=>FALSE, 'allow_self_signed'=>TRUE]]);

/** lower default timeouts */
ini_set('default_socket_timeout', '5');

/** init vars */
$max_fail_count = 3;
$fail_count     = 0;
$last_restart   = 0;

/** service loop */
while(true)
{
 if(check_service() !== TRUE)
 {
  $fail_count++;
  sleep(3);
 }
 /** avoid false positives and too fast restarts */
 if($fail_count >= $max_fail_count && (time()-$last_restart) > 60)
 {
  echo __FILE__ . ": ERROR - httpd.service: Service hold-off time over,
scheduling restart\n";
  passthru('/usr/bin/systemctl condrestart httpd.service');
  $fail_count   = 0;
  $last_restart = time();
 }
 /** sleep 10 seconds between checks */
 sleep(10);
}

/**
 * check if service is available and responds
 *
 * @access public
 * @return bool
*/
function check_service(): bool
{
 $rw = @file_get_contents($_SERVER['argv'][1]);
 if($rw === FALSE)
 {
  return FALSE;
 }
 else
 {
  return TRUE;
 }
}

> [Unit]
> Description=Test service
> After=network.target
> 
> [Service]
> Type=notify
> # test.sh wrapper script to call the service
> ExecStart=/opt/test/test.sh
> Restart=always
> RestartSec=1
> TimeoutSec=5
> WatchdogSec=5
> 
> [Install]
> WantedBy=multi-user.target
> 
> Then in test.sh can we do something like 
> 
> #!/bin/bash
> trap 'kill $(jobs -p)' EXIT
> 
> # Start the actual service
> /opt/test/service &
> PID=$!
> 
> /bin/systemd-notify --ready
> while(true); do
>     FAIL=0
>     kill -0 $PID
>     if [[ $? -ne 0 ]]; then FAIL=1; fi
> 
> #    curl http://localhost/test/
> #    if [[ $? -ne 0 ]]; then FAIL=1; fi
> 
> if [[ $FAIL -eq 0 ]]; then /bin/systemd-notify WATCHDOG=1; fi
> 
>     sleep 1
> done
> 
> 
> On Fri, Jul 26, 2019 at 12:27 AM Reindl Harald <h.reindl at thelounge.net
> <mailto:h.reindl at thelounge.net>> wrote:
> 
> 
> 
>     Am 25.07.19 um 20:38 schrieb Debraj Manna:
>     > I have a service on a Ubuntu 16.04 which I use systemctl start, stop,
>     > restart and status to control.
>     >
>     > One time the systemctl status returned active, but the application
>     > "behind" the service responded http code different from 200.
>     >
>     > So I would like to restart the service when the http code is not 200.
>     > Can some one let me know is there a way to achieve the same via
>     systemd?
> 
>     nope, just write a seperate service with a little curl magic and
>     "systemctl condrestart" and remember that you have to avoid premature
>     restarts just because of a little load peak


More information about the systemd-devel mailing list