[systemd-devel] Fragile journal interleaving

Uoti Urpala uoti.urpala at pp1.inet.fi
Sun Dec 10 22:36:21 UTC 2017


The code that tries to determine which of two journal entries is
"earlier" (similar logic in journal_file_comparare_locations() and
compare_with_location()) does this by trying different progressively
less reliable ways to compare - first by seqnum if seqnum id matches,
then by monotonic time if boot id matches, then by realtime timestamp.
Unfortunately, while this method could be said to give the "right"
answer more often than just comparing by the always-available realtime
timestamp, it does not give a valid ordering relation. As in, unless
you can already rely on all those comparisons giving the same
consistent order (when you could probably just use realtime anyway),
it's possible that you get a relation like A<B<C<A.

For example, these would have such a relation (assume that the omitted
monotonic values never determine order):
A: seqnum_id=0, seqnum=0, time=2
B: seqnum_id=0, seqnum=1, time=0
C: seqnum_id=1, seqnum=0, time=1

A<B by seqnum, B<C by time, C<A by time.

Such ordering means that the code interleaving entries from different
files can give rather arbitrary results depending on exactly which
order the entries are tested in. It triggers at least one issue that I
consider a clear bug: there's code in next_beyond_location() which
skips the next entry in a file if it's not in the expected direction
from the previous globally iterated entry, and this can discard valid
entries. A comment there says it's meant to discard duplicate entries
which were somehow recorded in multiple journal files (which I'd assume
to compare equal), but it also discards non-duplicate entries which
compare backwards from the previously shown one.

The reason I originally looked into this code was a set of old journal
files for which plain "journalctl" only showed 5 days of entries
despite the files containing months of data. This was triggered by a
user journal file which matched seqnum ids with some system journal
files, but had smaller seqnums from a much later time (the earliest
logs were old enough that the file could have been initialized by
journald before commit cbd671772c, so the exact same way to trigger
issues probably won't occur with files created by later versions). The
code searching for earlier entries first found the early system
journal, then moved to the user journal because it had smaller seqnums,
and finally moved to some other file that started 5 days before the end
(earlier than the user journal and with a different seqnum id - there
didn't happen to be other earlier files later in the iteration order).
After printing one entry from there, the next_beyond_location()
"duplicate" code then discarded all the earlier valid entries.

I'm not sure about the best way to fix these issues - are there
supposed to be any guarantees about interleaving? At the very least the
"duplicate" check should be fixed to not throw away arbitrary amounts
of valid entries. Any other code relying on assumptions about valid
ordering? Is the interleaving order supposed to be stable independent
of things like in which order the files are iterated over?



More information about the systemd-devel mailing list