[systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead

Lennart Poettering lennart at poettering.net
Mon Dec 16 07:54:06 PST 2013


On Mon, 16.12.13 15:36, Karol Lewandowski (k.lewandowsk at samsung.com) wrote:

> On 12/14/2013 04:47 AM, Lennart Poettering wrote:
> > On Fri, 13.12.13 22:16, Karol Lewandowski (lmctlx at gmail.com) wrote:
> >> On Fri, Dec 13, 2013 at 03:45:36PM +0100, Lennart Poettering wrote:
> >>> On Fri, 13.12.13 12:46, Karol Lewandowski (k.lewandowsk at samsung.com) wrote:
> 
> >> One of the problems I see, though, is that no matter how deep I make
> >> the queue (`max_dgram_qlen') I still see process sleeping on send()
> >> way earlier that configured queue depth would suggest.
> 
> > It would be interesting to find out why this happens. I mean, there are
> > three parameters here I could think of that matter: the qlen, SO_SNDBUF
> > on the sender, and SO_RCVBUF on the receiver (though the latter two might
> > actually change the same value on AF_UNIX? or maybe one of the latter
> > two is a NOP on AF_UNIX?). If any of them reaches the limit then the
> > sender will necessarily have to block.
> > 
> > (SO_SNDBUF and SO_RCVBUF can also be controlled via
> > /proc/sys/net/core/rmem* and ../wmem*... For testing purposes it might
> > be easier to play around with these and set them to ludicrously high
> > values...)
> 
> That's it.
> 
> While journal code tries to set buffer size via SO_SNDBUF/SO_RCVBUF
> options to 8MB, kernel limits these to wmem_max/rmem_max. On machines
> I've tested respective values are quite small - around 150-200kB each.

Hmm, so on the journald's side we actually use SO_RCVBUFFORCE to
override that kernel limit. If I understood you correctly though then
SO_SNDBUF on the sending side is the issue here, not SO_RCVBUF on the
receiving side.

We could certainly update src/journal/journal-send.c to also use
SO_SNDBUFFORCE on the client side, but that would leave unpriviliged
clients and traditional /dev/log clients in the cold, since the
SO_SNDBUFFORCE requires privs, and the client side for /dev/log lives in
glibc, not in systemd.

> Increasing these did reduce context switches considerably - preliminary
> tests show that I can now queue thousands of messages (~5k) without
> problems.  I will test this thoroughly in next few days.
> 
> I do wonder what is the rationale behind such low limits...

Well, usually the logic is to keep things conservative until you notice
that this creates issues.

To fix this properly, and comprehensively I'd really like to see three
changes in the kernel:

- Introduce a pair of SO_QLEN and SO_QLENFORCE sockopts to the kernel so
  that we can set the qlen per-socket

- Make the defaults for the rwmem configurable at kernel compile time

- Increase the defaults of the kernel

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list