[systemd-devel] Solution proposal for bug 56109

markohoyer at freenet.de markohoyer at freenet.de
Thu May 16 09:23:15 PDT 2013


 
Hello systemd experts and developers,
 
I recently stumbled over the bug with the watchdog mechanism that has already been reported to free desktop bugzilla (56109).
 
I analyzed the bug and came to a simple solution for solving it.
 
First, what I think is going on:
-        watchdog timeout is detected in service_handle_watchdog(), service_enter_dead(…) is called
-        service_enter_dead() sets the service state to auto_restart
-        triggered by a timer, service_enter_restart is called
-        service_enter_restart  schedules a restart job
-        systemd splits up the jobs into a stop and a start job and schedules both
-        the stop job lasts to a call of service_stop()
-        here it begins to get interesting:
-        based on the AUTO_RESTART state, this function decides to go directly into dead state, nothing of the normal stopping procedure is done. This is probably because in most cases that cause a restart to be scheduled the stop proceeding is done automatically (for instance in case of a killed or normally exiting service.). But this is not true for a watchdog timeout. Nothing of the stop proceeding is executed in case of such a timeout. So the process that missed to send the watchdog event is going on to life (in which state ever). No one is cleaning up. A second instance of the service is started.
 
My suggestion to solve this:
 
Changes are needed in service.c in service_stop(…).
 
change:
/* A restart will be scheduled or is in progress. */
        if (s->state == SERVICE_AUTO_RESTART) {
                service_set_state(s, SERVICE_DEAD);
                return 0;
        }
 
to:
/* A restart will be scheduled or is in progress. 
           In all cases but the watchdog timeout, stop is already progressed by systemd automatically*/
        if (s->state == SERVICE_AUTO_RESTART && s->result != SERVICE_FAILURE_WATCHDOG) {
                service_set_state(s, SERVICE_DEAD);
                return 0;
        }
 
and change:
 
assert(s->state == SERVICE_RUNNING ||
             s->state == SERVICE_EXITED);
 
 
to:
assert(s->state == SERVICE_RUNNING ||
               s->state == SERVICE_AUTO_RESTART ||
               s->state == SERVICE_EXITED);
 
I tested the following:
-        the watchdog mechanism is now actually stopping / killing the service in case it is not sending the watchdog event right in time
-        a restart triggered by a killed service works like before
 
Hopefully, I didn’t miss some side effects caused by my changes.
 
 
Any opinions on my proposed changes?
 
Kind regards,
 
Marko Hoyer
 
 
 
 
 


---
Alle Postfächer an einem Ort. Jetzt wechseln und E-Mail-Adresse mitnehmen! Rundum glücklich mit freenetMail
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20130516/4f75c277/attachment-0001.html>


More information about the systemd-devel mailing list