<html>
<head>
<base href="https://bugs.freedesktop.org/" />
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - journalctl boot reporting inconsistent"
href="https://bugs.freedesktop.org/show_bug.cgi?id=66664#c14">Comment # 14</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW --- - journalctl boot reporting inconsistent"
href="https://bugs.freedesktop.org/show_bug.cgi?id=66664">bug 66664</a>
from <span class="vcard"><a class="email" href="mailto:hortplumber@gmail.com" title="hortplumber@gmail.com">hortplumber@gmail.com</a>
</span></b>
<pre>(In reply to <a href="show_bug.cgi?id=66664#c12">comment #12</a>)
>
<span class="quote">> Sorry for not replying earlier... I got caught up with other things.
> </span >
No problem, didn't mean to rush you, just letting you know what I had been
fiddling around with.
>
<span class="quote">> Adding
>
> ExecStartPre=/usr/sbin/sysctl net/unix/max_dgram_qlen=1000
>
> to /etc/systemd/system/systemd-journald.socket.d/config.conf file
> should work (note that this is for .socket, not .service, since the socket
> is created first).</span >
Tried this, with the value you suggested (1000) and it does clean up the
observed message drops. Now, over 10 bootseq's, each one reports 103/103 of the
MOIs ("Starting"/"Started"), whereas had been seeing roughly 90/95 (in most
cases) and occasionally 60/95.
A few observations/comments:
1. Obvious: When the per-socket sockopt capability is added, you'll probably
want to set the qlen to a value larger than 10. :) There's nothing especially
unusual about my system, so I would have to guess that this problem is
happening with qlen=10 for a fairly large fraction of users, but simply has not
been noticed. (I didn't notice it myself until I began looking for a specific
boot message of interest and noticed that it sometimes appeared and sometimes
did not.)
2. I fully understand (as you point out) that here is no guarantee that
journald can always keep up with message generation rate with probability = 1
in all situations, regardless of buffer sizes and queue length choices. It's
not possible to close the source->sink loop without incurring a priority
inversion between source and sink when the source rate becomes very high. I
appreciate that and am all too familiar with it. But, I would opine, that it
ought to be an explicit design requirement on journald going forward that it
report with "very high" probability when messages have been dropped, rather
than dropping them silently. That ought to be possible using mutexes without
getting into priority inversion issues. IMO as an ordinary user, silently
dropping log messages should be considered grossly unacceptable except under
the most extreme message loading conditions (e.g. DoS attack :)). Silent drops
should have a vanishingly small probability of occurrence during ordinary
events such as boot-up.
IMO, if you don't adopt this as a basic design principle, you're shooting
yourself in the foot, PR-wise, because sooner or later people are going to
start whining about journald logging reliability vis a vis "good old sysV".
>
<span class="quote">> If you're using an initramfs with systemd, I think you'd need to put the
> file also in the initramfs.</span >
>
I am using an initramfs, but tried it first without putting the new config file
into it, and it worked. Curious as to why, if it "shouldn't have", but not
going to look it in the mouth. :)
One request, if you have time: Could you possibly post some example output from
"journalctl -a" from some (any) systems that you might have access to? I'd like
to run them thru my little analysis tool and see whether the kinds of drops I
was seeing (with qlen=10) are occurring. This will also have benefit for you as
developers to perhaps get an idea of how often this sort of stuff is happening.
Thanks for your time and assistance.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the QA Contact for the bug.</li>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>