[systemd-bugs] [Bug 64116] How does one fix journal corruptions?

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Oct 8 13:27:49 PDT 2014


https://bugs.freedesktop.org/show_bug.cgi?id=64116

--- Comment #3 from Lennart Poettering <lennart at poettering.net> ---
Since this bugyilla report is apparently sometimes linked these days as an
example how we wouldn't fix a major bug in systemd:

Journal files are mostly append-only files. We keep adding to the end as we go,
only updating minimal indexes and bookkeeping in the front earlier parts of the
files. These files are rotated (rotation = renamed and replaced by a new one)
from time to time, based on certain conditions, such as time, file size, and
also when we find the files to be corrupted. As soon as they rotate they are
entirely read-only, never modified again. When you use a tool like "journalctl"
to read the journal files both the active and the rotated files are implicitly
merged, so that they appear as a single stream again.

Now, our strategy to rotate-on-corruption is the safest thing we can do, as we
make sure that the internal corruption is frozen in time, and not attempted to
be "fixed" by a tool, that might end up making things worse. After all, in the
case the often-run writing code really fucks something up, then it is not
necessarily a good idea to try to make it better by running a tool on it that
tries to fix it up again, a tool that is necessarily a lot more complex, and
also less tested.

Now, of course, having corrupted files isn't great, and we should make sure the
files even when corrupted stay as accessible as possible. Hence: the code that
reads the journal files is actually written in a way that tries to make the
best of corrupted files, and tries to read of them as much as possible, with
the the subset of the file that is still valid. We do this implicitly on every
access. 

Hence: journalctl implicitly does on read what a theoretical journal file fsck
tool would do, but without actually making this persistent. This logic also has
a major benefit: as our reader gets better and learns to deal with more types
of corruptions you immediately benefit of it, even for old files!

File systems such as ext4 have an fsck tool since they don't have the luxury to
just rotate the fs away and fix the structure on read: they have to use the
same file system for all future writes, and they thus need to try hard to make
the existing data workable again.

I hope this explains the rationale here a bit more.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/systemd-bugs/attachments/20141008/63ce91d9/attachment.html>


More information about the systemd-bugs mailing list