[systemd-devel] Newbie question - Requires doesn't work properly
Reindl Harald
h.reindl at thelounge.net
Fri Nov 22 04:24:20 PST 2013
Am 22.11.2013 12:49, schrieb David Timothy Strauss:
> It is the responsibility of whatever sends the watchdog to ensure everything's healthy, however necessary. It would
> be silly to spawn a thread and have it blindly report health to watchdog. The point is for that thread to do proper
> checks but ensure reports go in at the right intervals.
i know that but the *how* is the question
you can internally check what not but that does not mean at
the end of the day the service responds correctly to a
client connection over the network until you do not go
through the same stack meaning doing a network connection
i spent hundrets of hours in upstream-debugging of dbmail
to find spinlocks and what not else only happening in
rare situations, one of them took 16 hours stress tests
until it happend with debug-log enabled while on the
real server it took a few minutes to get triggered
by a random client action
that's the difference between theory and real workload
your internal checks are mostly theory because in case of
a bug you have undefined behavior and what you want to achieve
with the watchdog is catch this undefined behavior and restart
the service - in doubt this will not work in the rare cases
the watchdog should restart until you went the complete
code-path of a client, in case of a IMAP server you can
enter the spin-loop everywhere from accept the connection
to folder listing or receive a message and it may depend
on a buffer overflow while high concurrency and different
threads are touching each other in a unexpected way
been there, died nearly in debug it and catch data for upstream
> On Nov 22, 2013 7:50 PM, "Reindl Harald" <h.reindl at thelounge.net <mailto:h.reindl at thelounge.net>> wrote:
>
>
> Am 22.11.2013 03:04, schrieb salil GK:
> > Thanks a lot David
> >
> > On 22 November 2013 06:44, David Timothy Strauss <david at davidstrauss.net <mailto:david at davidstrauss.net>
> <mailto:david at davidstrauss.net <mailto:david at davidstrauss.net>>> wrote:
> >
> > On Thu, Nov 21, 2013 at 4:57 PM, salil GK <gksalil at gmail.com <mailto:gksalil at gmail.com>
> <mailto:gksalil at gmail.com <mailto:gksalil at gmail.com>>> wrote:
> > > What happens is - my process may be busy with some other activity during
> > > which time it will fail to send periodic message to systemd. After a while
> > > it will come out of it's loop and ready to serve. But during this time
> > > system would have already marked the process as failed.
> >
> > Then you need to either use another thread, refactor to make a tighter
> > event loop, or increase the watchdog time. Drifting in and out of
> > tolerance with watchdog is not a safe strategy.
>
> the problem i see with "use another thread" is that this thread can happily
> work and send it's keep alive, but that does not mean at the end that the
> service itself is working OK and responsible because both are running
> isolated
>
> in case of network services it would be pretty cool if systemd watchdog
> could be configured to connect to the service avery n seconds and if
> there is no response restart it because this would monitor the real service
> without need external tools
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 263 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20131122/9a20ba9e/attachment.pgp>
More information about the systemd-devel
mailing list