[systemd-devel] Health check for a service managed by systemd

Debraj Manna subharaj.manna at gmail.com
Fri Jul 26 14:14:26 UTC 2019


Thanks Mantas and Reindl for all your suggestions.

On Fri, Jul 26, 2019 at 7:29 PM Reindl Harald <h.reindl at thelounge.net>
wrote:

>
>
> Am 26.07.19 um 15:37 schrieb Debraj Manna:
> > Thanks Reindl for replying.
> >
> > Can we make use of the watchdog & systemd-notify functionality of
> > systemd? I mean something like this.
>
> probably you can but i doubt you gain anything
>
> you just increase complexity with additional points of errors and i
> don't see the point when you have to use curl in both cases where the
> difference to just call "systemctl condrestart" is
>
> write a nice logline with 'logger' and the same wording as a failed
> service, i do that because a cronjob collects all that events systemwide
> to trigger cron mails
>
> don't forget 'condrestart' because it's not funny when you stop a
> service by purpose and some monitoring fires it up unasked, been there
> with mysqld 10 years ago.....
>
> that below is part of my httpd rpm and works like a charme for years and
> "$max_fail_count = 3" with the sleep is important in the real world
> because when you are under load and temporary out of workers it's not
> funny when some "crap" restarts the webserver and reset the PHP bytecode
> cache all the time
>
> ---------------------------------------------------------------------
>
> [root at testserver:~]$ systemctl status monitor-httpd
> ● monitor-httpd.service - Monitor/Restart Webserver
>    Loaded: loaded (/usr/lib/systemd/system/monitor-httpd.service;
> enabled; vendor preset: disabled)
>    Active: active (running) since Thu 2019-07-25 04:48:57 CEST; 1 day
> 11h ago
>  Main PID: 821 (php)
>     Tasks: 1 (limit: 512)
>    Memory: 3.7M
>    CGroup: /system.slice/monitor-httpd.service
>            └─821 /usr/bin/php -n -d display_errors=1 -d
> display_startup_errors=1 /usr/bin/monitor-httpd.php
> https://rhsoft.testserver.rhsoft.net/robots.txt
>
> ---------------------------------------------------------------------
>
> [root at testserver:~]$ cat /usr/lib/systemd/system/monitor-httpd.service
> [Unit]
> Description=Monitor/Restart Webserver
> After=httpd.service network-online.target
> Requires=network-online.target
> ConditionPathExists=/etc/sysconfig/monitor-httpd
> ConditionPathExists=/usr/bin/monitor-httpd.php
> ConditionPathExists=/usr/bin/php
>
> [Service]
> Type=simple
> EnvironmentFile=/etc/sysconfig/monitor-httpd
> ExecStart=/usr/bin/php -n -d display_errors=1 -d
> display_startup_errors=1 /usr/bin/monitor-httpd.php $MONITOR_URL
>
> Restart=always
> RestartSec=5
> TimeoutSec=5
>
> User=root
> Group=root
>
> CapabilityBoundingSet=CAP_KILL
> MemoryDenyWriteExecute=yes
> NoNewPrivileges=yes
> PrivateDevices=yes
> PrivateTmp=yes
> ProtectControlGroups=yes
> ProtectHome=yes
> ProtectKernelModules=yes
> ProtectKernelTunables=yes
> ProtectSystem=strict
>
> [Install]
> WantedBy=multi-user.target
>
> ---------------------------------------------------------------------
>
> [root at testserver:~]$ cat /etc/sysconfig/monitor-httpd
> MONITOR_URL=https://rhsoft.testserver.rhsoft.net/robots.txt
>
> ---------------------------------------------------------------------
>
> [root at testserver:~]$ cat /usr/bin/monitor-httpd.php
> #!/usr/bin/php
> <?php declare(strict_types=1);
> /** make sure we are running as shell-script */
> if(PHP_SAPI !== 'cli')
> {
>  exit("FORBIDDEN\n");
> }
>
> /** we need at test-url as param */
> if(empty($_SERVER['argv'][1]))
> {
>  exit("USAGE: monitor-httpd.php <URL>\n");
> }
>
> /** do not verify certificates */
> stream_context_set_default(['ssl'=>['verify_peer'=>FALSE,
> 'verify_peer_name'=>FALSE, 'allow_self_signed'=>TRUE]]);
>
> /** lower default timeouts */
> ini_set('default_socket_timeout', '5');
>
> /** init vars */
> $max_fail_count = 3;
> $fail_count     = 0;
> $last_restart   = 0;
>
> /** service loop */
> while(true)
> {
>  if(check_service() !== TRUE)
>  {
>   $fail_count++;
>   sleep(3);
>  }
>  /** avoid false positives and too fast restarts */
>  if($fail_count >= $max_fail_count && (time()-$last_restart) > 60)
>  {
>   echo __FILE__ . ": ERROR - httpd.service: Service hold-off time over,
> scheduling restart\n";
>   passthru('/usr/bin/systemctl condrestart httpd.service');
>   $fail_count   = 0;
>   $last_restart = time();
>  }
>  /** sleep 10 seconds between checks */
>  sleep(10);
> }
>
> /**
>  * check if service is available and responds
>  *
>  * @access public
>  * @return bool
> */
> function check_service(): bool
> {
>  $rw = @file_get_contents($_SERVER['argv'][1]);
>  if($rw === FALSE)
>  {
>   return FALSE;
>  }
>  else
>  {
>   return TRUE;
>  }
> }
>
> > [Unit]
> > Description=Test service
> > After=network.target
> >
> > [Service]
> > Type=notify
> > # test.sh wrapper script to call the service
> > ExecStart=/opt/test/test.sh
> > Restart=always
> > RestartSec=1
> > TimeoutSec=5
> > WatchdogSec=5
> >
> > [Install]
> > WantedBy=multi-user.target
> >
> > Then in test.sh can we do something like
> >
> > #!/bin/bash
> > trap 'kill $(jobs -p)' EXIT
> >
> > # Start the actual service
> > /opt/test/service &
> > PID=$!
> >
> > /bin/systemd-notify --ready
> > while(true); do
> >     FAIL=0
> >     kill -0 $PID
> >     if [[ $? -ne 0 ]]; then FAIL=1; fi
> >
> > #    curl http://localhost/test/
> > #    if [[ $? -ne 0 ]]; then FAIL=1; fi
> >
> > if [[ $FAIL -eq 0 ]]; then /bin/systemd-notify WATCHDOG=1; fi
> >
> >     sleep 1
> > done
> >
> >
> > On Fri, Jul 26, 2019 at 12:27 AM Reindl Harald <h.reindl at thelounge.net
> > <mailto:h.reindl at thelounge.net>> wrote:
> >
> >
> >
> >     Am 25.07.19 um 20:38 schrieb Debraj Manna:
> >     > I have a service on a Ubuntu 16.04 which I use systemctl start,
> stop,
> >     > restart and status to control.
> >     >
> >     > One time the systemctl status returned active, but the application
> >     > "behind" the service responded http code different from 200.
> >     >
> >     > So I would like to restart the service when the http code is not
> 200.
> >     > Can some one let me know is there a way to achieve the same via
> >     systemd?
> >
> >     nope, just write a seperate service with a little curl magic and
> >     "systemctl condrestart" and remember that you have to avoid premature
> >     restarts just because of a little load peak
> _______________________________________________
> systemd-devel mailing list
> systemd-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20190726/94698c3a/attachment.html>


More information about the systemd-devel mailing list