<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#ffffff" text="#000000">
<div class="moz-text-flowed" style="font-family: -moz-fixed;
font-size: 12px;" lang="x-unicode">On 02/08/2011 12:34 PM, Lennart
Poettering wrote:
<br>
<blockquote type="cite" style="color: rgb(0, 0, 0);">On Mon,
07.02.11 10:01, Cristian Patrascu (<a
class="moz-txt-link-abbreviated"
href="mailto:Cristian.Patrascu@windriver.com">Cristian.Patrascu@windriver.com</a>)
wrote:
<br>
<br>
<blockquote type="cite" style="color: rgb(0, 0, 0);">On
02/03/2011 11:19 PM, Michael Biebl wrote:
<br>
<blockquote type="cite" style="color: rgb(0, 0, 0);">2011/2/3
Tomasz Torcz<a class="moz-txt-link-rfc2396E"
href="mailto:tomek@pipebreaker.pl"><tomek@pipebreaker.pl></a>:
<br>
<blockquote type="cite" style="color: rgb(0, 0, 0);">On Thu,
Feb 03, 2011 at 09:35:55PM +0100, Michael Biebl wrote:
<br>
<blockquote type="cite" style="color: rgb(0, 0, 0);">ExecStartOnFailure=/some/cmd
(send an email or what not)
<br>
</blockquote>
Isn't “OnFailure=send-email-to-admin.service” sufficent?
<br>
</blockquote>
Having to use a service file for this use case is imho
cumbersome,
<br>
especially since you can't pass information to that service
afaik,
<br>
e.g. via env variables.
<br>
So this send-email-to-admin.service doesn't know which
service it was,
<br>
that failed and its last status code. Or am I misinformed
here?
<br>
<br>
Also, does OnFailure= work as I described with regard to
Restart=on-failure?
<br>
<br>
Michael
<br>
</blockquote>
On 02/03/2011 10:35 PM, Michael Biebl wrote:
<br>
<blockquote type="cite" style="color: rgb(0, 0, 0);">ExecStartOnFailure=/some/cmd
(send an email or what not)
<br>
<br>
If Restart=on-failure is not set, this command is executed
when the
<br>
starting the service has failed.
<br>
</blockquote>
2011/2/3 Cristian Patrascu<a class="moz-txt-link-rfc2396E"
href="mailto:Cristian.Patrascu@windriver.com"><Cristian.Patrascu@windriver.com></a>:
<br>
<blockquote type="cite" style="color: rgb(0, 0, 0);">-
RestartRetries=n (where n is the number of restart retries)
<br>
</blockquote>
Using "Restart=on-failure" and
"OnFailure=some-other.service", the
<br>
specified service will be started for every restart that exits
with
<br>
fail, of current service.
<br>
</blockquote>
I think tis current behaviour probably makes little sense, so my
<br>
suggestion would be to simply change it and run the services
listed in
<br>
OnFailure= only after we tried to restart the service and gave
up.
<br>
<br>
So here's what I'd suggest:
<br>
<br>
a) Introduce RestartRetries= as you suggested
<br>
<br>
b) Reshuffle things so that OnFailure= is run only after
RestartRetries=
<br>
is reached when it is set or each time if RestartRetries= is
not set.
<br>
<br>
c) Do not introduce ExecPostRestarts=
<br>
<br>
I think this would be a very simple, minimal and natural
extension and
<br>
should do everything you need, right? I'd be happy to merge such
a patch!
<br>
<br>
Lennart
<br>
</blockquote>
Patch is attached, sorry for answering so late.
<br>
<br>
Functionality:
<br>
In case a service failed to start properly (it exited with a
code
<br>
not 0 - not successfull) it has to be retried, to be restarted a
<br>
predefined number of times, until it ends successfully or until
the
<br>
predefined number of restarts is reached. For that, there is the
option
<br>
"MaxRestartRetries" wich accepts an integer = number of restarts.
This
<br>
applies only to ".service" units.
<br>
<br>
So, if a service has "MaxRestartRetries=<x>" and option
<br>
"Restart=no" or not set: it should be restarted maximum "x" times,
if it
<br>
exits unsuccessfully every time; if service has
"Restart=on-success"
<br>
then it will allways be restarted on successfull exit and maximum
"x"
<br>
restarts with unsuccessfull exits. For "Restart=always" it will
restart
<br>
"x" times max. if exited unsuccessfully and for successfull exit
always
<br>
restart. Otherwise for "Restart=on-abort" it's the same
functionality.
<br>
<br>
After "x" restarts have happened, the "OnFailure" unit will
start
<br>
(if specified) but only after all unsuccessfull restarts (after
"x"
<br>
restarts reached).
<br>
<br>
This patch is made for systemd-29, on master branch with SHA1 ID:
<br>
3661ac04b4f2840d3345605aa35963bbde3c469d
<br>
<br>
</div>
<div class="moz-text-plain" wrap="true" style="font-family:
-moz-fixed; font-size: 12px;" lang="x-western">
<pre wrap="">________systemd-restart-on-fail.patch : _______________________________
Index: systemd-29/src/dbus-service.c
===================================================================
--- systemd-29.orig/src/dbus-service.c        2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/dbus-service.c        2011-06-22 13:48:51.292321742 +0300
@@ -42,6 +42,8 @@
" <property name=\"PIDFile\" type=\"s\" access=\"read\"/>\n" \
" <property name=\"NotifyAccess\" type=\"s\" access=\"read\"/>\n" \
" <property name=\"RestartUSec\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"MaxRestartRetries\" type=\"i\" access=\"read\"/>\n" \
+ " <property name=\"RestartRetry\" type=\"i\" access=\"read\"/>\n" \
" <property name=\"TimeoutUSec\" type=\"t\" access=\"read\"/>\n" \
BUS_EXEC_COMMAND_INTERFACE("ExecStartPre") \
BUS_EXEC_COMMAND_INTERFACE("ExecStart") \
@@ -103,6 +105,8 @@
{ "org.freedesktop.systemd1.Service", "PIDFile", bus_property_append_string, "s", u->service.pid_file },
{ "org.freedesktop.systemd1.Service", "NotifyAccess", bus_service_append_notify_access, "s", &u->service.notify_access },
{ "org.freedesktop.systemd1.Service", "RestartUSec", bus_property_append_usec, "t", &u->service.restart_usec },
+ { "org.freedesktop.systemd1.Service", "MaxRestartRetries", bus_property_append_int, "i", &u->service.restart_retries },
+ { "org.freedesktop.systemd1.Service", "RestartRetry", bus_property_append_int, "i", &u->service.restart_crt_retry },
{ "org.freedesktop.systemd1.Service", "TimeoutUSec", bus_property_append_usec, "t", &u->service.timeout_usec },
BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u->service.exec_command[SERVICE_EXEC_START_PRE], "ExecStartPre"),
BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u->service.exec_command[SERVICE_EXEC_START], "ExecStart"),
Index: systemd-29/src/load-fragment.c
===================================================================
--- systemd-29.orig/src/load-fragment.c        2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/load-fragment.c        2011-06-22 13:48:51.292321742 +0300
@@ -1951,6 +1951,7 @@
{ "TimeoutSec", config_parse_usec, 0, &u->service.timeout_usec, "Service" },
{ "Type", config_parse_service_type, 0, &u->service.type, "Service" },
{ "Restart", config_parse_service_restart, 0, &u->service.restart, "Service" },
+ { "MaxRestartRetries", config_parse_int, 0, &u->service.restart_retries, "Service" },
{ "PermissionsStartOnly", config_parse_bool, 0, &u->service.permissions_start_only, "Service" },
{ "RootDirectoryStartOnly", config_parse_bool, 0, &u->service.root_directory_start_only, "Service" },
{ "RemainAfterExit", config_parse_bool, 0, &u->service.remain_after_exit, "Service" },
Index: systemd-29/src/service.c
===================================================================
--- systemd-29.orig/src/service.c        2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/service.c        2011-06-22 13:48:51.296321774 +0300
@@ -113,6 +113,8 @@
s->timeout_usec = DEFAULT_TIMEOUT_USEC;
s->restart_usec = DEFAULT_RESTART_USEC;
s->timer_watch.type = WATCH_INVALID;
+ s->restart_retries = DEFAULT_RESTART_RETRIES;
+ s->restart_crt_retry = 0;
#ifdef HAVE_SYSV_COMPAT
s->sysv_start_priority = -1;
s->sysv_start_priority_from_rcnd = -1;
@@ -1151,6 +1153,22 @@
if (s->meta.default_dependencies)
if ((r = service_add_default_dependencies(s)) < 0)
return r;
+
+ /* If the option "RestartRetries=" is set then the service will be "revived" */
+ if (s->restart_retries != DEFAULT_RESTART_RETRIES){
+ /* If the option "Restart=" is not set, then for the service to be "revived"
+ we must set "Restart=" to "on-failure" */
+ if (s->restart == SERVICE_RESTART_NO){
+ s->restart = SERVICE_RESTART_ON_FAILURE;
+ }
+
+ /* If the option "Restart=" is already set, then for the service to be "revived"
+ and be restarted on failure we must set "Restart=" to "always"
+ only if it's "on-success" */
+ else if (s->restart == SERVICE_RESTART_ON_SUCCESS){
+ s->restart = SERVICE_RESTART_ALWAYS;
+ }
+ }
}
return service_verify(s);
@@ -2291,6 +2309,7 @@
/* Make sure we don't enter a busy loop of some kind. */
if (!ratelimit_test(&s->ratelimit)) {
+ s->restart_crt_retry = 0;
log_warning("%s start request repeated too quickly, refusing to start.", u->meta.id);
return -ECANCELED;
}
@@ -2311,6 +2330,7 @@
/* This is a user request, so don't do restarts on this
* shutdown. */
+ s->restart_crt_retry = 0;
s->forbid_restart = true;
/* Already on it */
@@ -2604,6 +2624,11 @@
* gone. */
s->main_command = NULL;
+ if (success && s->restart_crt_retry){
+ log_debug("%s changed restart_crt_retry from %d to 0", u->meta.id, s->restart_crt_retry);
+ s->restart_crt_retry = 0;
+ }
+
switch (s->state) {
case SERVICE_START_POST:
@@ -2832,7 +2857,15 @@
break;
case SERVICE_AUTO_RESTART:
- log_info("%s holdoff time over, scheduling restart.", u->meta.id);
+ if (0 <= s->restart_retries){
+ s->restart_crt_retry ++;
+ if (s->restart_crt_retry > s->restart_retries){
+ log_warning("%s maximum number of restarting retries reached. Stopping.", u->meta.id);
+ service_enter_dead(s, true, false);
+ break;
+ }
+ }
+ log_info("%s holdoff time over, scheduling restart (retry %d of %d).", u->meta.id, s->restart_crt_retry, s->restart_retries);
service_enter_restart(s);
break;
Index: systemd-29/src/service.h
===================================================================
--- systemd-29.orig/src/service.h        2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/service.h        2011-06-22 13:48:51.296321774 +0300
@@ -92,6 +92,9 @@
ServiceType type;
ServiceRestart restart;
+ int restart_retries;
+ int restart_crt_retry;
+
/* If set we'll read the main daemon PID from this file */
char *pid_file;
Index: systemd-29/src/unit.c
===================================================================
--- systemd-29.orig/src/unit.c        2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/unit.c        2011-06-22 13:48:51.296321774 +0300
@@ -1114,6 +1114,10 @@
if (set_size(u->meta.dependencies[UNIT_ON_FAILURE]) <= 0)
return;
+ if (u->meta.type == UNIT_SERVICE &&
+ u->service.restart_crt_retry <= u->service.restart_retries)
+ return;
+
log_info("Triggering OnFailure= dependencies of %s.", u->meta.id);
SET_FOREACH(other, u->meta.dependencies[UNIT_ON_FAILURE], i) {
@@ -1245,6 +1249,17 @@
log_notice("Unit %s entered failed state.", u->meta.id);
unit_trigger_on_failure(u);
}
+
+ /* When a unit is of type service and has finished,
+ reset restart-retry counter, if applicable */
+ if (ns != os &&
+ u->meta.type == UNIT_SERVICE &&
+ UNIT_IS_INACTIVE_OR_FAILED(ns) &&
+ u->service.restart_crt_retry > u->service.restart_retries ){
+ log_debug("Unit %s entered inactive or failed, restart retries at max, reset counter (%d -> 0)!",
+ u->meta.id, u->service.restart_crt_retry, u->service.restart_retries);
+ u->service.restart_crt_retry = 0;
+ }
}
/* Some names are special */
Index: systemd-29/src/unit.h
===================================================================
--- systemd-29.orig/src/unit.h        2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/unit.h        2011-06-22 13:48:51.296321774 +0300
@@ -40,6 +40,9 @@
#include "execute.h"
#include "condition.h"
+/* default = infinite retries (= systemd behaviour not patched) */
+#define DEFAULT_RESTART_RETRIES -1
+
enum UnitType {
UNIT_SERVICE = 0,
UNIT_SOCKET,
</pre>
</div>
</body>
</html>