[systemd-bugs] [Bug 66664] journalctl boot reporting inconsistent
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Sat Jul 13 20:18:48 PDT 2013
https://bugs.freedesktop.org/show_bug.cgi?id=66664
--- Comment #12 from Zbigniew Jedrzejewski-Szmek <zbyszek at in.waw.pl> ---
(In reply to comment #10)
Sorry for not replying earlier... I got caught up with other things.
> Not having heard anything further on this since last week, I tried
>
> [service]
> Nice=-10
>
> in the nice.conf file. This seems to avoid the "Assignment outside of
> section"
> complaint. Not sure if "Service" was your intented section, but no other
> choice seemed to make much sense.
Yes, that is the way that this was intended to look.
> The results are shown in the attachment. Summary: No observable qualitative
> difference from the earlier results posted in Comments #7 and #8 and the
> original posting on the Arch forum: The distribution of the number of dropped
> messages is still roughly bimodal, with the "better" cases dropping roughly
> 5% of messages from every boot sequence, and the "worse" cases dropping
> around 30%.
Thank you for the extensive testing. In principle, there's no mechanism to
ensure that journald is able to process messages in time. But it usually works
fine. Since you removed the burst limits, and are still observing dropped
messages, then we can guess that journald is simply not able to process
messages fast enough. Apparently changing the nice level is not enough.
There's one last thing that you might try, see below.
> I also wrote a tool to do the journalctl log analysis, i.e. identify missing
> messages among a set of boot cycles, to avoid the tedious work of hand-
> comparsion. So if there are further experiments with process prioritization
> or buffering (or whatever else) you might like to try, the results of those
> experiments can now be obtained fairly quickly.
>
> That tool was used to generate the results in the attachment. In that output,
> "MOI's" means messages of interest, presently comprising only the "Starting"
> and "Started" messages generated by systemd (because they are syntactically
> consistent across boot sequences). Even though this is just a small subset
> of the total messages generated during a boot sequence, it seems sufficient
> to demonstrate the problem, i.e. that journald presently has some serious
> reliability issues. Needless to say, diagnosis of boot problems is made
> significantly more complicated when boot-time messages are frequently and
> silently being dropped from the logs.
>
> I have some past experience dealing with issues of this sort, i.e. diagnostic
> logging processes competing for resources (realtime and buffers) to keep up
> with bursty message generation. If you're interested perhaps we can discuss,
> either here or on the systemd ML.
>
> Let me know if you need any more info or example results.
The buffer for message is (by default on most distros), only 10.
The value is set in /proc/sys/net/unix/max_dgram_qlen. Increasing it would be
beneficial for journald, but this is a global setting, and we're waiting for
the superior solution of adding a sockopt to modify this for a specific socket.
But you can increase it for debugging. It has to be increased before the
journal socket is created. sysctl is usually run after journald has been
started, so it must be set in a different way. Adding
ExecStartPre=/usr/sbin/sysctl net/unix/max_dgram_qlen=1000
to /etc/systemd/system/systemd-journald.socket.d/config.conf file
should work (note that this is for .socket, not .service, since the socket is
created first).
If you're using an initramfs with systemd, I think you'd need to put the file
also in the initramfs.
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/systemd-bugs/attachments/20130714/1b6106c8/attachment.html>
More information about the systemd-bugs
mailing list