<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body bgcolor="#ffffff" text="#000000">
    <div class="moz-text-flowed" style="font-family: -moz-fixed;
      font-size: 12px;" lang="x-unicode">On 02/08/2011 12:34 PM, Lennart
      Poettering wrote:
      <br>
      <blockquote type="cite" style="color: rgb(0, 0, 0);">On Mon,
        07.02.11 10:01, Cristian Patrascu (<a
          class="moz-txt-link-abbreviated"
          href="mailto:Cristian.Patrascu@windriver.com">Cristian.Patrascu@windriver.com</a>)
        wrote:
        <br>
        <br>
        <blockquote type="cite" style="color: rgb(0, 0, 0);">On
          02/03/2011 11:19 PM, Michael Biebl wrote:
          <br>
          <blockquote type="cite" style="color: rgb(0, 0, 0);">2011/2/3
            Tomasz Torcz<a class="moz-txt-link-rfc2396E"
              href="mailto:tomek@pipebreaker.pl">&lt;tomek@pipebreaker.pl&gt;</a>:
            <br>
            <blockquote type="cite" style="color: rgb(0, 0, 0);">On Thu,
              Feb 03, 2011 at 09:35:55PM +0100, Michael Biebl wrote:
              <br>
              <blockquote type="cite" style="color: rgb(0, 0, 0);">ExecStartOnFailure=/some/cmd
                (send an email or what not)
                <br>
              </blockquote>
              &nbsp; Isn't &#8220;OnFailure=send-email-to-admin.service&#8221; sufficent?
              <br>
            </blockquote>
            Having to use a service file for this use case is imho
            cumbersome,
            <br>
            especially since you can't pass information to that service
            afaik,
            <br>
            e.g. via env variables.
            <br>
            So this send-email-to-admin.service doesn't know which
            service it was,
            <br>
            that failed and its last status code. Or am I misinformed
            here?
            <br>
            <br>
            Also, does OnFailure= work as I described with regard to
            Restart=on-failure?
            <br>
            <br>
            Michael
            <br>
          </blockquote>
          On 02/03/2011 10:35 PM, Michael Biebl wrote:
          <br>
          <blockquote type="cite" style="color: rgb(0, 0, 0);">ExecStartOnFailure=/some/cmd
            (send an email or what not)
            <br>
            <br>
            If Restart=on-failure is not set, this command is executed
            when the
            <br>
            starting the service has failed.
            <br>
          </blockquote>
          2011/2/3 Cristian Patrascu<a class="moz-txt-link-rfc2396E"
            href="mailto:Cristian.Patrascu@windriver.com">&lt;Cristian.Patrascu@windriver.com&gt;</a>:
          <br>
          <blockquote type="cite" style="color: rgb(0, 0, 0);">-
            RestartRetries=n (where n is the number of restart retries)
            <br>
          </blockquote>
          Using "Restart=on-failure" and
          "OnFailure=some-other.service",&nbsp; the
          <br>
          specified service will be started for every restart that exits
          with
          <br>
          fail, of current service.
          <br>
        </blockquote>
        I think tis current behaviour probably makes little sense, so my
        <br>
        suggestion would be to simply change it and run the services
        listed in
        <br>
        OnFailure= only after we tried to restart the service and gave
        up.
        <br>
        <br>
        So here's what I'd suggest:
        <br>
        <br>
        a) Introduce RestartRetries= as you suggested
        <br>
        <br>
        b) Reshuffle things so that OnFailure= is run only after
        RestartRetries=
        <br>
        &nbsp;&nbsp;&nbsp; is reached when it is set or each time if RestartRetries= is
        not set.
        <br>
        <br>
        c) Do not introduce ExecPostRestarts=
        <br>
        <br>
        I think this would be a very simple, minimal and natural
        extension and
        <br>
        should do everything you need, right? I'd be happy to merge such
        a patch!
        <br>
        <br>
        Lennart
        <br>
      </blockquote>
      Patch is attached, sorry for answering so late.
      <br>
      <br>
      Functionality:
      <br>
      &nbsp;&nbsp;&nbsp;&nbsp; In case a service failed to start properly (it exited with a
      code
      <br>
      not 0 - not successfull) it has to be retried, to be restarted a
      <br>
      predefined number of times, until it ends successfully or until
      the
      <br>
      predefined number of restarts is reached. For that, there is the
      option
      <br>
      "MaxRestartRetries" wich accepts an integer = number of restarts.
      This
      <br>
      applies only to ".service" units.
      <br>
      <br>
      &nbsp;&nbsp;&nbsp;&nbsp; So, if a service has "MaxRestartRetries=&lt;x&gt;" and option
      <br>
      "Restart=no" or not set: it should be restarted maximum "x" times,
      if it
      <br>
      exits unsuccessfully every time; if service has
      "Restart=on-success"
      <br>
      then it will allways be restarted on successfull exit and maximum
      "x"
      <br>
      restarts with unsuccessfull exits. For "Restart=always" it will
      restart
      <br>
      "x" times max. if exited unsuccessfully and for successfull exit
      always
      <br>
      restart. Otherwise for "Restart=on-abort" it's the same
      functionality.
      <br>
      <br>
      &nbsp;&nbsp;&nbsp;&nbsp; After "x" restarts have happened, the "OnFailure" unit will
      start
      <br>
      (if specified) but only after all unsuccessfull restarts (after
      "x"
      <br>
      restarts reached).
      <br>
      <br>
      This patch is made for systemd-29, on master branch with SHA1 ID:
      <br>
      3661ac04b4f2840d3345605aa35963bbde3c469d
      <br>
      <br>
    </div>
    <div class="moz-text-plain" wrap="true" style="font-family:
      -moz-fixed; font-size: 12px;" lang="x-western">
      <pre wrap="">________systemd-restart-on-fail.patch : _______________________________

Index: systemd-29/src/dbus-service.c
===================================================================
--- systemd-29.orig/src/dbus-service.c        2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/dbus-service.c        2011-06-22 13:48:51.292321742 +0300
@@ -42,6 +42,8 @@
         "  &lt;property name=\"PIDFile\" type=\"s\" access=\"read\"/&gt;\n"   \
         "  &lt;property name=\"NotifyAccess\" type=\"s\" access=\"read\"/&gt;\n" \
         "  &lt;property name=\"RestartUSec\" type=\"t\" access=\"read\"/&gt;\n" \
+        "  &lt;property name=\"MaxRestartRetries\" type=\"i\" access=\"read\"/&gt;\n" \
+        "  &lt;property name=\"RestartRetry\" type=\"i\" access=\"read\"/&gt;\n" \
         "  &lt;property name=\"TimeoutUSec\" type=\"t\" access=\"read\"/&gt;\n" \
         BUS_EXEC_COMMAND_INTERFACE("ExecStartPre")                      \
         BUS_EXEC_COMMAND_INTERFACE("ExecStart")                         \
@@ -103,6 +105,8 @@
                 { "org.freedesktop.systemd1.Service", "PIDFile",                bus_property_append_string, "s", u-&gt;service.pid_file                   },
                 { "org.freedesktop.systemd1.Service", "NotifyAccess",           bus_service_append_notify_access, "s", &amp;u-&gt;service.notify_access       },
                 { "org.freedesktop.systemd1.Service", "RestartUSec",            bus_property_append_usec,   "t", &amp;u-&gt;service.restart_usec              },
+                { "org.freedesktop.systemd1.Service", "MaxRestartRetries",      bus_property_append_int,    "i", &amp;u-&gt;service.restart_retries           },
+                { "org.freedesktop.systemd1.Service", "RestartRetry",           bus_property_append_int,    "i", &amp;u-&gt;service.restart_crt_retry         },
                 { "org.freedesktop.systemd1.Service", "TimeoutUSec",            bus_property_append_usec,   "t", &amp;u-&gt;service.timeout_usec              },
                 BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u-&gt;service.exec_command[SERVICE_EXEC_START_PRE],  "ExecStartPre"),
                 BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u-&gt;service.exec_command[SERVICE_EXEC_START],      "ExecStart"),
Index: systemd-29/src/load-fragment.c
===================================================================
--- systemd-29.orig/src/load-fragment.c        2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/load-fragment.c        2011-06-22 13:48:51.292321742 +0300
@@ -1951,6 +1951,7 @@
                 { "TimeoutSec",             config_parse_usec,            0, &amp;u-&gt;service.timeout_usec,                        "Service" },
                 { "Type",                   config_parse_service_type,    0, &amp;u-&gt;service.type,                                "Service" },
                 { "Restart",                config_parse_service_restart, 0, &amp;u-&gt;service.restart,                             "Service" },
+                { "MaxRestartRetries",      config_parse_int,             0, &amp;u-&gt;service.restart_retries,                     "Service" },
                 { "PermissionsStartOnly",   config_parse_bool,            0, &amp;u-&gt;service.permissions_start_only,              "Service" },
                 { "RootDirectoryStartOnly", config_parse_bool,            0, &amp;u-&gt;service.root_directory_start_only,           "Service" },
                 { "RemainAfterExit",        config_parse_bool,            0, &amp;u-&gt;service.remain_after_exit,                   "Service" },
Index: systemd-29/src/service.c
===================================================================
--- systemd-29.orig/src/service.c        2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/service.c        2011-06-22 13:48:51.296321774 +0300
@@ -113,6 +113,8 @@
         s-&gt;timeout_usec = DEFAULT_TIMEOUT_USEC;
         s-&gt;restart_usec = DEFAULT_RESTART_USEC;
         s-&gt;timer_watch.type = WATCH_INVALID;
+        s-&gt;restart_retries = DEFAULT_RESTART_RETRIES;
+        s-&gt;restart_crt_retry = 0;
 #ifdef HAVE_SYSV_COMPAT
         s-&gt;sysv_start_priority = -1;
         s-&gt;sysv_start_priority_from_rcnd = -1;
@@ -1151,6 +1153,22 @@
                 if (s-&gt;meta.default_dependencies)
                         if ((r = service_add_default_dependencies(s)) &lt; 0)
                                 return r;
+
+                /* If the option "RestartRetries=" is set then the service will be "revived" */
+                if (s-&gt;restart_retries != DEFAULT_RESTART_RETRIES){
+                        /* If the option "Restart=" is not set, then for the service to be "revived"
+                           we must set "Restart=" to "on-failure" */
+                        if (s-&gt;restart == SERVICE_RESTART_NO){
+                                s-&gt;restart = SERVICE_RESTART_ON_FAILURE;
+                        }
+
+                        /* If the option "Restart=" is already set, then for the service to be "revived"
+                           and be restarted on failure we must set "Restart=" to "always"
+                           only if it's "on-success" */
+                        else if (s-&gt;restart == SERVICE_RESTART_ON_SUCCESS){
+                                s-&gt;restart = SERVICE_RESTART_ALWAYS;
+                        }
+                }
         }
 
         return service_verify(s);
@@ -2291,6 +2309,7 @@
 
         /* Make sure we don't enter a busy loop of some kind. */
         if (!ratelimit_test(&amp;s-&gt;ratelimit)) {
+                s-&gt;restart_crt_retry = 0;
                 log_warning("%s start request repeated too quickly, refusing to start.", u-&gt;meta.id);
                 return -ECANCELED;
         }
@@ -2311,6 +2330,7 @@
 
         /* This is a user request, so don't do restarts on this
          * shutdown. */
+        s-&gt;restart_crt_retry = 0;
         s-&gt;forbid_restart = true;
 
         /* Already on it */
@@ -2604,6 +2624,11 @@
                          * gone. */
                         s-&gt;main_command = NULL;
 
+                        if (success &amp;&amp; s-&gt;restart_crt_retry){
+                                log_debug("%s changed restart_crt_retry from %d to 0", u-&gt;meta.id, s-&gt;restart_crt_retry);
+                                s-&gt;restart_crt_retry = 0;
+                        }
+
                         switch (s-&gt;state) {
 
                         case SERVICE_START_POST:
@@ -2832,7 +2857,15 @@
                 break;
 
         case SERVICE_AUTO_RESTART:
-                log_info("%s holdoff time over, scheduling restart.", u-&gt;meta.id);
+                if (0 &lt;= s-&gt;restart_retries){
+                        s-&gt;restart_crt_retry ++;
+                        if (s-&gt;restart_crt_retry &gt; s-&gt;restart_retries){
+                                log_warning("%s maximum number of restarting retries reached. Stopping.", u-&gt;meta.id);
+                                service_enter_dead(s, true, false);
+                                break;
+                        }
+                }
+                log_info("%s holdoff time over, scheduling restart (retry %d of %d).", u-&gt;meta.id, s-&gt;restart_crt_retry, s-&gt;restart_retries);
                 service_enter_restart(s);
                 break;
 
Index: systemd-29/src/service.h
===================================================================
--- systemd-29.orig/src/service.h        2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/service.h        2011-06-22 13:48:51.296321774 +0300
@@ -92,6 +92,9 @@
         ServiceType type;
         ServiceRestart restart;
 
+        int restart_retries;
+        int restart_crt_retry;
+
         /* If set we'll read the main daemon PID from this file */
         char *pid_file;
 
Index: systemd-29/src/unit.c
===================================================================
--- systemd-29.orig/src/unit.c        2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/unit.c        2011-06-22 13:48:51.296321774 +0300
@@ -1114,6 +1114,10 @@
         if (set_size(u-&gt;meta.dependencies[UNIT_ON_FAILURE]) &lt;= 0)
                 return;
 
+        if (u-&gt;meta.type == UNIT_SERVICE &amp;&amp;
+                u-&gt;service.restart_crt_retry &lt;= u-&gt;service.restart_retries)
+                return;
+
         log_info("Triggering OnFailure= dependencies of %s.", u-&gt;meta.id);
 
         SET_FOREACH(other, u-&gt;meta.dependencies[UNIT_ON_FAILURE], i) {
@@ -1245,6 +1249,17 @@
                         log_notice("Unit %s entered failed state.", u-&gt;meta.id);
                         unit_trigger_on_failure(u);
                 }
+
+                /* When a unit is of type service and has finished,
+                   reset restart-retry counter, if applicable */
+                if (ns != os &amp;&amp;
+                    u-&gt;meta.type == UNIT_SERVICE &amp;&amp;
+                    UNIT_IS_INACTIVE_OR_FAILED(ns) &amp;&amp;
+                    u-&gt;service.restart_crt_retry &gt; u-&gt;service.restart_retries ){
+                            log_debug("Unit %s entered inactive or failed, restart retries at max, reset counter (%d -&gt; 0)!",
+                                      u-&gt;meta.id, u-&gt;service.restart_crt_retry, u-&gt;service.restart_retries);
+                            u-&gt;service.restart_crt_retry = 0;
+                }
         }
 
         /* Some names are special */
Index: systemd-29/src/unit.h
===================================================================
--- systemd-29.orig/src/unit.h        2011-06-22 10:47:14.000000000 +0300
+++ systemd-29/src/unit.h        2011-06-22 13:48:51.296321774 +0300
@@ -40,6 +40,9 @@
 #include "execute.h"
 #include "condition.h"
 
+/* default = infinite retries (= systemd behaviour not patched) */
+#define DEFAULT_RESTART_RETRIES -1
+
 enum UnitType {
         UNIT_SERVICE = 0,
         UNIT_SOCKET,
</pre>
    </div>
  </body>
</html>