<html> <head> <base href="https://bugs.freedesktop.org/" /> </head> <body> <div> <a class="bz_bug_link bz_status_NEW " title="NEW --- - journalctl boot reporting inconsistent" href="https://bugs.freedesktop.org/show_bug.cgi?id=66664#c14">Comment # 14</a> on <a class="bz_bug_link bz_status_NEW " title="NEW --- - journalctl boot reporting inconsistent" href="https://bugs.freedesktop.org/show_bug.cgi?id=66664">bug 66664</a> from <a class="email" href="mailto:hortplumber@gmail.com" title="hortplumber@gmail.com">hortplumber@gmail.com</a> <pre>(In reply to <a href="show_bug.cgi?id=66664#c12">comment #12</a>) > > Sorry for not replying earlier... I got caught up with other things. > No problem, didn't mean to rush you, just letting you know what I had been fiddling around with. > > Adding > > ExecStartPre=/usr/sbin/sysctl net/unix/max_dgram_qlen=1000 > > to /etc/systemd/system/systemd-journald.socket.d/config.conf file > should work (note that this is for .socket, not .service, since the socket > is created first). Tried this, with the value you suggested (1000) and it does clean up the observed message drops. Now, over 10 bootseq's, each one reports 103/103 of the MOIs ("Starting"/"Started"), whereas had been seeing roughly 90/95 (in most cases) and occasionally 60/95. A few observations/comments: 1. Obvious: When the per-socket sockopt capability is added, you'll probably want to set the qlen to a value larger than 10. :) There's nothing especially unusual about my system, so I would have to guess that this problem is happening with qlen=10 for a fairly large fraction of users, but simply has not been noticed. (I didn't notice it myself until I began looking for a specific boot message of interest and noticed that it sometimes appeared and sometimes did not.) 2. I fully understand (as you point out) that here is no guarantee that journald can always keep up with message generation rate with probability = 1 in all situations, regardless of buffer sizes and queue length choices. It's not possible to close the source->sink loop without incurring a priority inversion between source and sink when the source rate becomes very high. I appreciate that and am all too familiar with it. But, I would opine, that it ought to be an explicit design requirement on journald going forward that it report with "very high" probability when messages have been dropped, rather than dropping them silently. That ought to be possible using mutexes without getting into priority inversion issues. IMO as an ordinary user, silently dropping log messages should be considered grossly unacceptable except under the most extreme message loading conditions (e.g. DoS attack :)). Silent drops should have a vanishingly small probability of occurrence during ordinary events such as boot-up. IMO, if you don't adopt this as a basic design principle, you're shooting yourself in the foot, PR-wise, because sooner or later people are going to start whining about journald logging reliability vis a vis "good old sysV". > > If you're using an initramfs with systemd, I think you'd need to put the > file also in the initramfs. > I am using an initramfs, but tried it first without putting the new config file into it, and it worked. Curious as to why, if it "shouldn't have", but not going to look it in the mouth. :) One request, if you have time: Could you possibly post some example output from "journalctl -a" from some (any) systems that you might have access to? I'd like to run them thru my little analysis tool and see whether the kinds of drops I was seeing (with qlen=10) are occurring. This will also have benefit for you as developers to perhaps get an idea of how often this sort of stuff is happening. Thanks for your time and assistance.</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the QA Contact for the bug.</li> <li>You are the assignee for the bug.</li> </ul> </body> </html>