[systemd-devel] Newbie question - Requires doesn't work properly

Reindl Harald h.reindl at thelounge.net
Fri Nov 22 04:24:20 PST 2013



Am 22.11.2013 12:49, schrieb David Timothy Strauss:
> It is the responsibility of whatever sends the watchdog to ensure everything's healthy, however necessary. It would
> be silly to spawn a thread and have it blindly report health to watchdog. The point is for that thread to do proper
> checks but ensure reports go in at the right intervals.

i know that but the *how* is the question

you can internally check what not but that does not mean at
the end of the day the service responds correctly to a
client connection over the network until you do not go
through the same stack meaning doing a network connection

i spent hundrets of hours in upstream-debugging of dbmail
to find spinlocks and what not else only happening in
rare situations, one of them took 16 hours stress tests
until it happend with debug-log enabled while on the
real server it took a few minutes to get triggered
by a random client action

that's the difference between theory and real workload

your internal checks are mostly theory because in case of
a bug you have undefined behavior and what you want to achieve
with the watchdog is catch this undefined behavior and restart
the service - in doubt this will not work in the rare cases
the watchdog should restart until you went the complete
code-path of a client, in case of a IMAP server you can
enter the spin-loop everywhere from accept the connection
to folder listing or receive a message and it may depend
on a buffer overflow while high concurrency and different
threads are touching each other in a unexpected way

been there, died nearly in debug it and catch data for upstream

> On Nov 22, 2013 7:50 PM, "Reindl Harald" <h.reindl at thelounge.net <mailto:h.reindl at thelounge.net>> wrote:
> 
> 
>     Am 22.11.2013 03:04, schrieb salil GK:
>     > Thanks a lot David
>     >
>     > On 22 November 2013 06:44, David Timothy Strauss <david at davidstrauss.net <mailto:david at davidstrauss.net>
>     <mailto:david at davidstrauss.net <mailto:david at davidstrauss.net>>> wrote:
>     >
>     >     On Thu, Nov 21, 2013 at 4:57 PM, salil GK <gksalil at gmail.com <mailto:gksalil at gmail.com>
>     <mailto:gksalil at gmail.com <mailto:gksalil at gmail.com>>> wrote:
>     >     > What happens is - my process may be busy with some other activity during
>     >     > which time it will fail to send periodic message to systemd. After a while
>     >     > it will come out of it's loop and ready to serve. But during this time
>     >     > system would have already marked the process as failed.
>     >
>     >     Then you need to either use another thread, refactor to make a tighter
>     >     event loop, or increase the watchdog time. Drifting in and out of
>     >     tolerance with watchdog is not a safe strategy.
> 
>     the problem i see with "use another thread" is that this thread can happily
>     work and send it's keep alive, but that does not mean at the end that the
>     service itself is working OK and responsible because both are running
>     isolated
> 
>     in case of network services it would be pretty cool if systemd watchdog
>     could be configured to connect to the service avery n seconds and if
>     there is no response restart it because this would monitor the real service
>     without need external tools

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 263 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20131122/9a20ba9e/attachment.pgp>


More information about the systemd-devel mailing list