[systemd-devel] [ANNOUNCE] Journal File Format Documentation

Tue Oct 23 07:39:29 PDT 2012

On Tue, 23.10.12 15:25, Ciprian Dorin Craciun (ciprian.craciun at gmail.com) wrote:

>     But what I couldn't find in any of these documents (maybe there is
> in another one), is a justification of the current technical (i.e.
> implementation) decisions. Mainly:

That's a valid question to raise.

>     Why did you resort to implementing a new database format, and
> didn't choose an existing embedded library like BerkeleyDB, LevelDB,
> etc.? (Advantages / disadvantages?)

There are a number of reasons, which one could summarize as: because
there is no existing database implementation that would fit the bill:

- we needed something small, embeddable, in pure C, so that we can pull
  it in everywhere. That has a somewhat stable API, is sanely managed
  upstream, and Free Software. We are OK to add deps to systemd, if
  there's a good reason to and the dep is well managed. It needed to be
  OOM safe.

- The database should be typeless, and index by all fields, rather than
  require fixed schemas. It should efficient with large and binary data.

- It should not require file locks or communication between multiple
  readers or between readers and the writer. This is primarily a
  question of security (we cannot allow users to lock out root or the
  writer from acessing the logs by taking a lock) and network
  transparency (file locks on network FS are very very flaky), but also
  performance.

- We wanted something robust for IO failures that focusses on appending
  new data to the end, rather than overwriting data constantly. 

- We needed something with in-line compression, and where we can add
  stuff like FSS to

These are the strong requirements, but there are other are ore things to
keep in mind: because of the structure of log data, which knows no
changes but only appends and the occasional deletion of large chunks,
and were data is generally montonically ordered you can a lot of things
you cannot do in normal databases.

rsyslog apparently chose to use ElasticSearch. It think ElasticSearch is
cool, but it already fails for us on the most superficial of things, in
that it would be quite ridiculous to pull in Java into all systems for
that... ;-)

Lennart

-- 
Lennart Poettering - Red Hat, Inc.