[systemd-devel] [RFC] [PATCH 0/3] journal: Add deferred log processing to reduce synchonous IO overhead

Fri Dec 13 13:16:16 PST 2013

On Fri, Dec 13, 2013 at 03:45:36PM +0100, Lennart Poettering wrote:
> On Fri, 13.12.13 12:46, Karol Lewandowski (k.lewandowsk at samsung.com) wrote:

> Well, are you suggesting that the AF_UNIX/SOCK_DGRAM code actually hands
> off the timeslice to the other side as soon as it queued something in?

If by other side you mean receiving one, then no - qnx seems to do
that, but it isn't the case here. What I'm trying to say here is that
kernel puts the process doing send(2) into sleep when (a) queue fills
up and (b) fd is blocking one (otherwise we just get EAGAIN).  That's
expected, I presume.

One of the problems I see, though, is that no matter how deep I make
the queue (`max_dgram_qlen') I still see process sleeping on send()
way earlier that configured queue depth would suggest.

> THat would be news to me. The kernel does context switches when the
> timeslice is over not earlier afaik, and you'll get that aynway...

If you take a look at Results.sock_dgram from my previous mail you
will find this for sock-client:

        Voluntary context switches: 577696
        Involuntary context switches: 633

Please correct me if I'm wrong but I think that involuntary context
switch happens when process' timeslice is over, where voluntary is
caused by syscall, probably doing some form of IO (read/send/select/etc.)

If I make sock-server sleep(5) just before poll and strace the client
I can clearly see that it blocks on send(). After receiving side picks
packet up, then the clients is able to sent another one.

This what I think is happening, please let me know if this sounds like
utter nonsense to you. :)

> > > Here's another option: extend journald to use kdbus as additional
> > > transport. This is something we want to do anyway since the kdbus
> > > transport will attach the metadata we need without race to each
> > > packet. Given that kdbus ultimately is just a way to write into an
> > > mmaped tmpfs that some other process owns this should not be much worse
> > > than the android logger in performance.
> > 
> > I doubt it would help as other side would still be woken up on _every_
> > message, right?
> 
> Well, it gets woken up if its waiting for that. But the kernel will only
> give CPU time to it when the senders timeslice is over... 

I'm either doing something terribly wrong or timeslice can end due to
the call to send(2) and friends.

> There's very
> little difference to mmap... I mean, you need to tell the other side
> that it should look in the buffer, how do you want to do that otherwise?

I don't.  I just want the buffer to be huge enough to not cause client
to block, effectively waiting for receiving side pick stuff up.

Cheers,
Karol