[systemd-devel] start / stop daemon
Bonno Bloksma
b.bloksma at tio.nl
Tue Aug 19 22:36:42 PDT 2014
Hi,
[....]
>> I wonder if the people developing systemd are paying attention to a development in de Windows
>> environment where the latest thing is that de service can report back that it is indeed still trying to
>> stop and not just hung or not reporting back. Windows will now kill a service after a certain time
>> when shutting down, in some cases it is killing a database that took A LONG TIME to shut down
>> and cause the database to become inconsistent. The new development is to make sure that does
>> not happen.
>> If systemd is trying to become smart about stopping services it might be a good idea to have this
>> built in. Also not just have the service report back "I am still busy" but also with a progress indicator
>> which NEEDS to increase at each report so systemd can detect whether the service is indeed
>> progressing towards a stopped state or hung in the getting there.
>> From the past I have seen things go wrong in communication when the only thing reported back is "I am busy" while there was no progress being made toward the finish.
>>
>> Is this something the systemd team has already put on the todo list or am I the first to suggest it?
>
> Possibly:
>
> http://thread.gmane.org/gmane.comp.sysutils.systemd.devel/21419
>
> http://article.gmane.org/gmane.comp.sysutils.systemd.devel/21997
>
> "Introducing sd_notify() messages that can notify PID 1 about daemons reloading or shutting down, has been on the TODO list for a while"
Ok, it looks like the problem has been seen before. The reason I think it is a good idea to have also a progress indicator is to make sure a daemon cannot keep a system from shutting down when there is no real progress towards a stopped state.
An example from my "communication days".
A file needs to be transported and the job is handed over to the file transfer protocol job, whichever protocol that might be in this case. The file transfer starts and after a while there are some errors on the line so some blocks are resend. Before the job is finished the line conditions become so bad that there is still some data transfer going on but each block sent has errors and needs to be resend. So the protocol job is still busy sending the file but no progress is being made.
In the good old days when we were still using phone lines and modems I have seen an international filetransfer that should have lasted a few minutes keep an open line for several hours until the operator noticed the busy line when it should have been free and canceled the job. A later protocol that was developed included a progress indicator in the protocol so it could keep track whether any progress was being made towards the end, if that counter did not increase each time then a watchdog part of the protocol would kill the transfer.
Something like that could happen during daemon start / stop, mainly during stop I think, where a job wants to complete some steps and tells systemd "hold on, I am still busy", at the same time for whatever reason it can no longer complete those steps and at some point needs to be killed in order for the system to continue shutting down or do whatever.
The "hard" part for the daemon will be what to use for a "progress indicator" and not simply use an i++ counter each time systemd asks it whether it has almost finished shutting down. You want something that indicates real progress but at the same time is small enough that is can increase each time system asks the daemon. On the other hand, that might not be too hard if systemd does not ask too often. ;-)
Most daemons will not need this feature and systemd can rely on a timeout killing the job if it does not stop within x seconds. But it would be good if the start / stop protocol allows for it when a notify part is developed.
After all if the system is running on the UPS batteries after a power failure and the low battery indicator was used to start a system shutdown you want the system to shutdown eventually before the battery runs out. ;-)
Bonno Bloksma
More information about the systemd-devel
mailing list