[systemd-devel] timed events

Thu Jun 28 23:01:46 PDT 2012

2012/6/29 Kok, Auke-jan H <auke-jan.h.kok at intel.com>:
> On Fri, Jun 29, 2012 at 12:49 AM, Nathan <qwerty.nat at gmail.com> wrote:
>> Another issue (though slightly related) is we have an external binary
>> that when run will return 0 or 1 depending if we should run a service
>> is there a way to run this command in the service_name.service and start the
>> service if it returns 0  and stop the service if the script
>> returns 1 (retrying the script every 5 minutes or so).
>
> cheap trick: make a script and run it from a timer, have the script
> run `systemctl ...`
>
> better trick: fix the daemon to do all of this properly.

Hello. The company I work for has a similar need. The director has
permitted me to disclose the details in full, in hope that this will
permit you to understand the use case better and understand why "fix
the daemon" is not a possible solution in our case. We are not using
systemd yet on our servers, but this doesn't make the problem
statement invalid.

We have several servers hosted at different ISPs, and our own
autonomous system. The service is provided to our clients via IPv4
anycast. So, at each of the servers, we run bgpd (from quagga) and
announce a route to our own IPv4 block. This means that each client
will be routed to the nearest (in the BGP sense) server. It also
protects our service against outages that affect the entire ISP, and
allows us to perform maintenance and software upgrades safely (i.e.
with near zero visible downtime for clients) by stopping bgpd first.

The issue is that twice in the company's lifetime there was a payment
problem with one of the servers. When this happened, the ISP did not
shut down the affected server. Instead, they somehow firewalled the
packets destined to it, but the BGP session was left intact. End
result: the route is still announced into the global routing table,
but doesn't work, and some clients see service interruption. So, as a
protection against such mistakes, we need some form of a custom dead
man's switch that would stop bgpd if none of the test IPv4 addresses
is pingable.

Of course, such monitoring need is specific to our use case, and other
companies will either not need it at all or write a dead man's switch
with a different logic.

So the logic, as I understand it, should be as follows: run bgpd if
the administrator has not prohibited this due to maintenance or
similar reasons, and the periodically-executed (?) dead-man's-switch
script doesn't say that bgpd should not run.

The "run systemctl from timer" is close, but not close enough: extra
care is needed during maintenance periods to disable the dead man's
switch script (so it doesn't restart bgpd contrary to the
administrator's decision) and not to forget to reenable it later.

-- 
Alexander E. Patrakov