[systemd-devel] [ANNOUNCE] Journal File Format Documentation

Tue Oct 23 08:48:09 PDT 2012

On Tue, Oct 23, 2012 at 5:39 PM, Lennart Poettering
<lennart at poettering.net> wrote:
> On Tue, 23.10.12 15:25, Ciprian Dorin Craciun (ciprian.craciun at gmail.com) wrote:
>>     Why did you resort to implementing a new database format, and
>> didn't choose an existing embedded library like BerkeleyDB, LevelDB,
>> etc.? (Advantages / disadvantages?)
>
> There are a number of reasons, which one could summarize as: because
> there is no existing database implementation that would fit the bill:
>
> - we needed something small, embeddable, in pure C, so that we can pull
>   it in everywhere. That has a somewhat stable API, is sanely managed
>   upstream, and Free Software. We are OK to add deps to systemd, if
>   there's a good reason to and the dep is well managed. It needed to be
>   OOM safe.
>
> - We wanted something robust for IO failures that focusses on appending
>   new data to the end, rather than overwriting data constantly.
>
> - We needed something with in-line compression, and where we can add
>   stuff like FSS to

    Ok. I agree that there are very few libraries that fit here. All I
can think of making into here would be BerkeleyDB (but it fails other
requirements you've listed below).


> - The database should be typeless, and index by all fields, rather than
>   require fixed schemas. It should efficient with large and binary data.

    One thing bothers me: why should it index all fields? (For example
indexing by UID, executable, service, etc. makes sense, but I don't
think indexing by message is that worthwhile... Moreover by PID or
coredump (which I think it is hinted is stored in the journal) doesn't
make too much sense either...)


> - It should not require file locks or communication between multiple
>   readers or between readers and the writer. This is primarily a
>   question of security (we cannot allow users to lock out root or the
>   writer from acessing the logs by taking a lock) and network
>   transparency (file locks on network FS are very very flaky), but also
>   performance.

    From what I see this is the best reason for the current proposal.
Indeed no embedded database library (that I know of) allows both
reading and writing at the same time from multiple processes without
locking. (Except maybe DJB's CDB and the TinyCDB implementation, but
that wouldn't fit the bill here.)

    Maybe this should go at the top of that document as describing "why?".


> These are the strong requirements, but there are other are ore things to
> keep in mind: because of the structure of log data, which knows no
> changes but only appends and the occasional deletion of large chunks,
> and were data is generally montonically ordered you can a lot of things
> you cannot do in normal databases.

    Although I partially agree about this increased flexibility,
having a custom format means it is very easy to just start "adding
features", thus accumulating cruft... Thus maybe a general purpose
system would have limited this tendency...


> rsyslog apparently chose to use ElasticSearch. It think ElasticSearch is
> cool, but it already fails for us on the most superficial of things, in
> that it would be quite ridiculous to pull in Java into all systems for
> that... ;-)

    I don't even want to imply such a thing solution. (Or at least not
for a standalone computer logging system.)


    BTW, a little bit off-topic:
    * why didn't you try to implement this journal system as a
standalone library, that could have been reused by other systems
independently of systemd; (I know this was answered in [2], and that
the focus is on systemd, but it seems it took quite a lot of work, and
it's a pity it can't be individually reused);
    * how do you intend to implement something resembly syslog's log
centralization?

    Ciprian.