[systemd-devel] Fragile journal interleaving

Tue Dec 12 20:38:06 UTC 2017

On Di, 12.12.17 21:00, Uoti Urpala (uoti.urpala at pp1.inet.fi) wrote:

> On Tue, 2017-12-12 at 17:09 +0100, Lennart Poettering wrote:
> > On Mo, 11.12.17 00:36, Uoti Urpala (uoti.urpala at pp1.inet.fi) wrote:
> > > consider a clear bug: there's code in next_beyond_location() which
> > > skips the next entry in a file if it's not in the expected direction
> > > from the previous globally iterated entry, and this can discard valid
> > > entries. A comment there says it's meant to discard duplicate entries
> > > which were somehow recorded in multiple journal files (which I'd assume
> > > to compare equal), but it also discards non-duplicate entries which
> > > compare backwards from the previously shown one.
> > 
> > Note that two entries will only compare as fully identical if their
> > "xor_hash" is equal too. The xor_hash is the XOR combination of the
> > hashes of all of the entry's fields. That means realistically only
> > records that actually are identical should be considered as such.
> 
> I assume that would be suitable for handling the case of actual
> duplicates? How do those happen anyway?

it's supposed to handle people copying things around between machines,
backups, rsync/scp and so on... it's also supposed to deal with remote
push/pull getting out of sync and so on.

> > I am not entirely sure what we can do here. Maybe this can work: beef
> > up the comparison logic so that it returns more than
> > smaller/equal/larger but also a special value "ambiguous". And when
> > that is returned we don't enforce monotonicity strictly but instead go
> > record-by-record, if you follow what I mean?
> 
> I don't see how that would help, at least not without some extra
> assumptions/changes. In my example problem case above, the ambiguous
> comparisons happen when deciding which file to get the first entry
> from. There's no natural default "first file", so even if you only know
> it's ambiguous you have to pick some anyway. If you pick the one the
> current code does, the following discard check is not ambiguous - it's
> discarding entries with earlier realtime and non-comparable other
> values. Or do you mean that if an ambiguous comparison was EVER seen,
> monotonicity would be permanently disabled? I don't really see an
> advantage for that over just not enforcing monotonicity at all, and
> handling any added-file special cases separately.

Hmm, I see, the problem is that the ambiguity is not obvious when just
looking at two of the entries...

Maybe the approach needs to be that we immedately increase the read
record ptr of a specific file by one when we read it, so that we know
we monotonically progress through the file. And then change the logic
that looks for the next entry across all files to honour that, and
then simply skip over fully identical entries, but not insist on
monotonic timers otherwise.

With that approach we can be sure that we never enter a loop, and
we'll most filter out duplicates (well, except if duplicate entries
show up in multiple files in different orders, but that's OK then,
dunno).

Lennart

-- 
Lennart Poettering, Red Hat