[systemd-devel] timed events

Kok, Auke-jan H auke-jan.h.kok at intel.com
Thu Jun 28 23:46:54 PDT 2012


On Thu, Jun 28, 2012 at 11:01 PM, Alexander E. Patrakov
<patrakov at gmail.com> wrote:
> 2012/6/29 Kok, Auke-jan H <auke-jan.h.kok at intel.com>:
>> On Fri, Jun 29, 2012 at 12:49 AM, Nathan <qwerty.nat at gmail.com> wrote:
>>> Another issue (though slightly related) is we have an external binary
>>> that when run will return 0 or 1 depending if we should run a service
>>> is there a way to run this command in the service_name.service and start the
>>> service if it returns 0  and stop the service if the script
>>> returns 1 (retrying the script every 5 minutes or so).
>>
>> cheap trick: make a script and run it from a timer, have the script
>> run `systemctl ...`
>>
>> better trick: fix the daemon to do all of this properly.
>
> Hello. The company I work for has a similar need. The director has
> permitted me to disclose the details in full, in hope that this will
> permit you to understand the use case better and understand why "fix
> the daemon" is not a possible solution in our case. We are not using
> systemd yet on our servers, but this doesn't make the problem
> statement invalid.
>
> We have several servers hosted at different ISPs, and our own
> autonomous system. The service is provided to our clients via IPv4
> anycast. So, at each of the servers, we run bgpd (from quagga) and
> announce a route to our own IPv4 block. This means that each client
> will be routed to the nearest (in the BGP sense) server. It also
> protects our service against outages that affect the entire ISP, and
> allows us to perform maintenance and software upgrades safely (i.e.
> with near zero visible downtime for clients) by stopping bgpd first.
>
> The issue is that twice in the company's lifetime there was a payment
> problem with one of the servers. When this happened, the ISP did not
> shut down the affected server. Instead, they somehow firewalled the
> packets destined to it, but the BGP session was left intact. End
> result: the route is still announced into the global routing table,
> but doesn't work, and some clients see service interruption. So, as a
> protection against such mistakes, we need some form of a custom dead
> man's switch that would stop bgpd if none of the test IPv4 addresses
> is pingable.
>
> Of course, such monitoring need is specific to our use case, and other
> companies will either not need it at all or write a dead man's switch
> with a different logic.
>
> So the logic, as I understand it, should be as follows: run bgpd if
> the administrator has not prohibited this due to maintenance or
> similar reasons, and the periodically-executed (?) dead-man's-switch
> script doesn't say that bgpd should not run.
>
> The "run systemctl from timer" is close, but not close enough: extra
> care is needed during maintenance periods to disable the dead man's
> switch script (so it doesn't restart bgpd contrary to the
> administrator's decision) and not to forget to reenable it later.

nothing a sticky note on a monitor couldn't fix.

A real solution would be to use some sort of heartbeat feature, or
just wrap the bgpd in a wrapper program that takes care of
starting/stopping it. That allows you to keep the wrapper running from
systemd at all times. No timers needed.

Auke


More information about the systemd-devel mailing list