[systemd-devel] [PATCH] systemd restart patch

Cristian Patrascu Cristian.Patrascu at windriver.com
Fri Jul 1 05:35:36 PDT 2011


On 02/08/2011 12:34 PM, Lennart Poettering wrote:
> On Mon, 07.02.11 10:01, Cristian Patrascu 
> (Cristian.Patrascu at windriver.com) wrote:
>
>> On 02/03/2011 11:19 PM, Michael Biebl wrote:
>>> 2011/2/3 Tomasz Torcz<tomek at pipebreaker.pl>:
>>>> On Thu, Feb 03, 2011 at 09:35:55PM +0100, Michael Biebl wrote:
>>>>> ExecStartOnFailure=/some/cmd (send an email or what not)
>>>>   Isn't "OnFailure=send-email-to-admin.service" sufficent?
>>> Having to use a service file for this use case is imho cumbersome,
>>> especially since you can't pass information to that service afaik,
>>> e.g. via env variables.
>>> So this send-email-to-admin.service doesn't know which service it was,
>>> that failed and its last status code. Or am I misinformed here?
>>>
>>> Also, does OnFailure= work as I described with regard to 
>>> Restart=on-failure?
>>>
>>> Michael
>> On 02/03/2011 10:35 PM, Michael Biebl wrote:
>>> ExecStartOnFailure=/some/cmd (send an email or what not)
>>>
>>> If Restart=on-failure is not set, this command is executed when the
>>> starting the service has failed.
>> 2011/2/3 Cristian Patrascu<Cristian.Patrascu at windriver.com>:
>>> - RestartRetries=n (where n is the number of restart retries)
>> Using "Restart=on-failure" and "OnFailure=some-other.service",  the
>> specified service will be started for every restart that exits with
>> fail, of current service.
> I think tis current behaviour probably makes little sense, so my
> suggestion would be to simply change it and run the services listed in
> OnFailure= only after we tried to restart the service and gave up.
>
> So here's what I'd suggest:
>
> a) Introduce RestartRetries= as you suggested
>
> b) Reshuffle things so that OnFailure= is run only after RestartRetries=
>     is reached when it is set or each time if RestartRetries= is not set.
>
> c) Do not introduce ExecPostRestarts=
>
> I think this would be a very simple, minimal and natural extension and
> should do everything you need, right? I'd be happy to merge such a patch!
>
> Lennart
Patch is attached, sorry for answering so late.

Functionality:
      In case a service failed to start properly (it exited with a code
not 0 - not successfull) it has to be retried, to be restarted a
predefined number of times, until it ends successfully or until the
predefined number of restarts is reached. For that, there is the option
"MaxRestartRetries" wich accepts an integer = number of restarts. This
applies only to ".service" units.

      So, if a service has "MaxRestartRetries=<x>" and option
"Restart=no" or not set: it should be restarted maximum "x" times, if it
exits unsuccessfully every time; if service has "Restart=on-success"
then it will allways be restarted on successfull exit and maximum "x"
restarts with unsuccessfull exits. For "Restart=always" it will restart
"x" times max. if exited unsuccessfully and for successfull exit always
restart. Otherwise for "Restart=on-abort" it's the same functionality.

      After "x" restarts have happened, the "OnFailure" unit will start
(if specified) but only after all unsuccessfull restarts (after "x"
restarts reached).

This patch is made for systemd-29, on master branch with SHA1 ID:
3661ac04b4f2840d3345605aa35963bbde3c469d

________systemd-restart-on-fail.patch : _______________________________

Index: systemd-29/src/dbus-service.c
===================================================================
--- systemd-29.orig/src/dbus-service.c	2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/dbus-service.c	2011-06-22 13:48:51.292321742 +0300
@@ -42,6 +42,8 @@
          "<property name=\"PIDFile\" type=\"s\" access=\"read\"/>\n"   \
          "<property name=\"NotifyAccess\" type=\"s\" access=\"read\"/>\n" \
          "<property name=\"RestartUSec\" type=\"t\" access=\"read\"/>\n" \
+        "<property name=\"MaxRestartRetries\" type=\"i\" access=\"read\"/>\n" \
+        "<property name=\"RestartRetry\" type=\"i\" access=\"read\"/>\n" \
          "<property name=\"TimeoutUSec\" type=\"t\" access=\"read\"/>\n" \
          BUS_EXEC_COMMAND_INTERFACE("ExecStartPre")                      \
          BUS_EXEC_COMMAND_INTERFACE("ExecStart")                         \
@@ -103,6 +105,8 @@
                  { "org.freedesktop.systemd1.Service", "PIDFile",                bus_property_append_string, "s", u->service.pid_file                   },
                  { "org.freedesktop.systemd1.Service", "NotifyAccess",           bus_service_append_notify_access, "s",&u->service.notify_access       },
                  { "org.freedesktop.systemd1.Service", "RestartUSec",            bus_property_append_usec,   "t",&u->service.restart_usec              },
+                { "org.freedesktop.systemd1.Service", "MaxRestartRetries",      bus_property_append_int,    "i",&u->service.restart_retries           },
+                { "org.freedesktop.systemd1.Service", "RestartRetry",           bus_property_append_int,    "i",&u->service.restart_crt_retry         },
                  { "org.freedesktop.systemd1.Service", "TimeoutUSec",            bus_property_append_usec,   "t",&u->service.timeout_usec              },
                  BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u->service.exec_command[SERVICE_EXEC_START_PRE],  "ExecStartPre"),
                  BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u->service.exec_command[SERVICE_EXEC_START],      "ExecStart"),
Index: systemd-29/src/load-fragment.c
===================================================================
--- systemd-29.orig/src/load-fragment.c	2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/load-fragment.c	2011-06-22 13:48:51.292321742 +0300
@@ -1951,6 +1951,7 @@
                  { "TimeoutSec",             config_parse_usec,            0,&u->service.timeout_usec,                        "Service" },
                  { "Type",                   config_parse_service_type,    0,&u->service.type,                                "Service" },
                  { "Restart",                config_parse_service_restart, 0,&u->service.restart,                             "Service" },
+                { "MaxRestartRetries",      config_parse_int,             0,&u->service.restart_retries,                     "Service" },
                  { "PermissionsStartOnly",   config_parse_bool,            0,&u->service.permissions_start_only,              "Service" },
                  { "RootDirectoryStartOnly", config_parse_bool,            0,&u->service.root_directory_start_only,           "Service" },
                  { "RemainAfterExit",        config_parse_bool,            0,&u->service.remain_after_exit,                   "Service" },
Index: systemd-29/src/service.c
===================================================================
--- systemd-29.orig/src/service.c	2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/service.c	2011-06-22 13:48:51.296321774 +0300
@@ -113,6 +113,8 @@
          s->timeout_usec = DEFAULT_TIMEOUT_USEC;
          s->restart_usec = DEFAULT_RESTART_USEC;
          s->timer_watch.type = WATCH_INVALID;
+        s->restart_retries = DEFAULT_RESTART_RETRIES;
+        s->restart_crt_retry = 0;
  #ifdef HAVE_SYSV_COMPAT
          s->sysv_start_priority = -1;
          s->sysv_start_priority_from_rcnd = -1;
@@ -1151,6 +1153,22 @@
                  if (s->meta.default_dependencies)
                          if ((r = service_add_default_dependencies(s))<  0)
                                  return r;
+
+                /* If the option "RestartRetries=" is set then the service will be "revived" */
+                if (s->restart_retries != DEFAULT_RESTART_RETRIES){
+                        /* If the option "Restart=" is not set, then for the service to be "revived"
+                           we must set "Restart=" to "on-failure" */
+                        if (s->restart == SERVICE_RESTART_NO){
+                                s->restart = SERVICE_RESTART_ON_FAILURE;
+                        }
+
+                        /* If the option "Restart=" is already set, then for the service to be "revived"
+                           and be restarted on failure we must set "Restart=" to "always"
+                           only if it's "on-success" */
+                        else if (s->restart == SERVICE_RESTART_ON_SUCCESS){
+                                s->restart = SERVICE_RESTART_ALWAYS;
+                        }
+                }
          }

          return service_verify(s);
@@ -2291,6 +2309,7 @@

          /* Make sure we don't enter a busy loop of some kind. */
          if (!ratelimit_test(&s->ratelimit)) {
+                s->restart_crt_retry = 0;
                  log_warning("%s start request repeated too quickly, refusing to start.", u->meta.id);
                  return -ECANCELED;
          }
@@ -2311,6 +2330,7 @@

          /* This is a user request, so don't do restarts on this
           * shutdown. */
+        s->restart_crt_retry = 0;
          s->forbid_restart = true;

          /* Already on it */
@@ -2604,6 +2624,11 @@
                           * gone. */
                          s->main_command = NULL;

+                        if (success&&  s->restart_crt_retry){
+                                log_debug("%s changed restart_crt_retry from %d to 0", u->meta.id, s->restart_crt_retry);
+                                s->restart_crt_retry = 0;
+                        }
+
                          switch (s->state) {

                          case SERVICE_START_POST:
@@ -2832,7 +2857,15 @@
                  break;

          case SERVICE_AUTO_RESTART:
-                log_info("%s holdoff time over, scheduling restart.", u->meta.id);
+                if (0<= s->restart_retries){
+                        s->restart_crt_retry ++;
+                        if (s->restart_crt_retry>  s->restart_retries){
+                                log_warning("%s maximum number of restarting retries reached. Stopping.", u->meta.id);
+                                service_enter_dead(s, true, false);
+                                break;
+                        }
+                }
+                log_info("%s holdoff time over, scheduling restart (retry %d of %d).", u->meta.id, s->restart_crt_retry, s->restart_retries);
                  service_enter_restart(s);
                  break;

Index: systemd-29/src/service.h
===================================================================
--- systemd-29.orig/src/service.h	2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/service.h	2011-06-22 13:48:51.296321774 +0300
@@ -92,6 +92,9 @@
          ServiceType type;
          ServiceRestart restart;

+        int restart_retries;
+        int restart_crt_retry;
+
          /* If set we'll read the main daemon PID from this file */
          char *pid_file;

Index: systemd-29/src/unit.c
===================================================================
--- systemd-29.orig/src/unit.c	2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/unit.c	2011-06-22 13:48:51.296321774 +0300
@@ -1114,6 +1114,10 @@
          if (set_size(u->meta.dependencies[UNIT_ON_FAILURE])<= 0)
                  return;

+        if (u->meta.type == UNIT_SERVICE&&
+                u->service.restart_crt_retry<= u->service.restart_retries)
+                return;
+
          log_info("Triggering OnFailure= dependencies of %s.", u->meta.id);

          SET_FOREACH(other, u->meta.dependencies[UNIT_ON_FAILURE], i) {
@@ -1245,6 +1249,17 @@
                          log_notice("Unit %s entered failed state.", u->meta.id);
                          unit_trigger_on_failure(u);
                  }
+
+                /* When a unit is of type service and has finished,
+                   reset restart-retry counter, if applicable */
+                if (ns != os&&
+                    u->meta.type == UNIT_SERVICE&&
+                    UNIT_IS_INACTIVE_OR_FAILED(ns)&&
+                    u->service.restart_crt_retry>  u->service.restart_retries ){
+                            log_debug("Unit %s entered inactive or failed, restart retries at max, reset counter (%d ->  0)!",
+                                      u->meta.id, u->service.restart_crt_retry, u->service.restart_retries);
+                            u->service.restart_crt_retry = 0;
+                }
          }

          /* Some names are special */
Index: systemd-29/src/unit.h
===================================================================
--- systemd-29.orig/src/unit.h	2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/unit.h	2011-06-22 13:48:51.296321774 +0300
@@ -40,6 +40,9 @@
  #include "execute.h"
  #include "condition.h"

+/* default = infinite retries (= systemd behaviour not patched) */
+#define DEFAULT_RESTART_RETRIES -1
+
  enum UnitType {
          UNIT_SERVICE = 0,
          UNIT_SOCKET,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20110701/9c0e8ea1/attachment-0001.htm>


More information about the systemd-devel mailing list