[systemd-devel] Slow startup of systemd-journal on BTRFS

Mon Jun 16 04:56:25 PDT 2014

On Mon, Jun 16, 2014 at 2:14 PM, Lennart Poettering
<lennart at poettering.net> wrote:
> On Mon, 16.06.14 10:17, Russell Coker (russell at coker.com.au) wrote:
>
>> > I am not really following though why this trips up btrfs though. I am
>> > not sure I understand why this breaks btrfs COW behaviour. I mean,
>> > fallocate() isn't necessarily supposed to write anything really, it's
>> > mostly about allocating disk space in advance. I would claim that
>> > journald's usage of it is very much within the entire reason why it
>> > exists...
>>
>> I don't believe that fallocate() makes any difference to fragmentation on
>> BTRFS.  Blocks will be allocated when writes occur so regardless of an
>> fallocate() call the usage pattern in systemd-journald will cause
>> fragmentation.
>
> journald's write pattern looks something like this: append something to
> the end, make sure it is written, then update a few offsets stored at
> the beginning of the file to point to the newly appended data. This is
> of course not easy to handle for COW file systems. But then again, it's
> probably not too different from access patterns of other database or
> database-like engines...
>

... which traditionally experienced severe sequential read performance
degradation in this case. As I understand this is exactly what happens
- readahead attempts to preload files which gives us heavy random read
access.

The only real remedy was to defragment files. It should work
relatively well for journal where files are mostly "write once" at the
expense of additional read/write activity.