[systemd-devel] Solution proposal for bug 56109
markohoyer at freenet.de
markohoyer at freenet.de
Thu May 16 09:23:15 PDT 2013
Hello systemd experts and developers,
I recently stumbled over the bug with the watchdog mechanism that has already been reported to free desktop bugzilla (56109).
I analyzed the bug and came to a simple solution for solving it.
First, what I think is going on:
- watchdog timeout is detected in service_handle_watchdog(), service_enter_dead(…) is called
- service_enter_dead() sets the service state to auto_restart
- triggered by a timer, service_enter_restart is called
- service_enter_restart schedules a restart job
- systemd splits up the jobs into a stop and a start job and schedules both
- the stop job lasts to a call of service_stop()
- here it begins to get interesting:
- based on the AUTO_RESTART state, this function decides to go directly into dead state, nothing of the normal stopping procedure is done. This is probably because in most cases that cause a restart to be scheduled the stop proceeding is done automatically (for instance in case of a killed or normally exiting service.). But this is not true for a watchdog timeout. Nothing of the stop proceeding is executed in case of such a timeout. So the process that missed to send the watchdog event is going on to life (in which state ever). No one is cleaning up. A second instance of the service is started.
My suggestion to solve this:
Changes are needed in service.c in service_stop(…).
change:
/* A restart will be scheduled or is in progress. */
if (s->state == SERVICE_AUTO_RESTART) {
service_set_state(s, SERVICE_DEAD);
return 0;
}
to:
/* A restart will be scheduled or is in progress.
In all cases but the watchdog timeout, stop is already progressed by systemd automatically*/
if (s->state == SERVICE_AUTO_RESTART && s->result != SERVICE_FAILURE_WATCHDOG) {
service_set_state(s, SERVICE_DEAD);
return 0;
}
and change:
assert(s->state == SERVICE_RUNNING ||
s->state == SERVICE_EXITED);
to:
assert(s->state == SERVICE_RUNNING ||
s->state == SERVICE_AUTO_RESTART ||
s->state == SERVICE_EXITED);
I tested the following:
- the watchdog mechanism is now actually stopping / killing the service in case it is not sending the watchdog event right in time
- a restart triggered by a killed service works like before
Hopefully, I didn’t miss some side effects caused by my changes.
Any opinions on my proposed changes?
Kind regards,
Marko Hoyer
---
Alle Postfächer an einem Ort. Jetzt wechseln und E-Mail-Adresse mitnehmen! Rundum glücklich mit freenetMail
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20130516/4f75c277/attachment-0001.html>
More information about the systemd-devel
mailing list