[systemd-devel] System stability when journald locks up

Lennart Poettering lennart at poettering.net
Wed May 30 14:02:39 PDT 2012


On Wed, 30.05.12 23:41, Marti Raudsepp (marti at juffo.org) wrote:

> > Adding the timeout change there (which would actually be dead-easy,
> > simply by using SO_SNDTIMEO) would not really fix the problem too well
> > though: given the amount of messages that are generated the system
> > might not be locked up entirely but still very slow.
> 
> True. Or O_NONBLOCK, but then we'd start dropping messages as soon as
> the kernel packet queue fills up.

O_NONBLOCK is what systemd itself uses when logging.

Basically there are two ways how to deal with socket buffers running
full: increase them or apply a timeout. We can do both easily with
SO_SNDBUF and SO_SNDTIMEO. 

There are a couple of caveats though:

SO_SNDTIMEO sucks since it slows down everything, hardly helping (as
mentioned).

SO_SNDBUF sucks since we currently lack a way to increase the AF_UNIX
SOCK_DGRAM queue length individually for each socket and the default is
rather low (/proc/sys/net/unix/max_dgram_qlen, i.e. 10). After 10
queued messages sending clients will block. This could be fixed
relatively easily by adding a new sockopt to the kernel. Our plumbing
wishlist includes an item for that since a while. The patch is probably
easy but so far nobody stepped up.

> Well, one potential solution would be to spawn a thread whose only job
> is to pop messages from /dev/log and copy them to a larger user-space
> buffer, shared with journald. If this memory buffer fills up, we know
> for a fact that an application is sustainably generating more messages
> than we can write out, so dropping them might be a good idea to
> prevent journald from becoming the bottleneck. Additionally, journald
> could emit a notice about the dropped messages.

Well, I'd prefer if we could use the normal kernel socket buffers for
this (see above). However, with that we can't implement the dropped
messages notice. But the kernel could actually provide a counter for
this easily too. So maybe another kernel patch for that?

Lennart

-- 
Lennart Poettering - Red Hat, Inc.


More information about the systemd-devel mailing list