[systemd-devel] Slow startup of systemd-journal on BTRFS

Sat Jun 14 07:31:22 PDT 2014

Goffredo Baroncelli <kreijack at libero.it> schrieb:

>> First, I've set the journal directories nocow.
> 
> If you use nocow, you lost the btrfs ability to rebuild a RAID array
> discarding the wrong sector. With the systemd journal checksum, you can
> say that a data is wrong, but BTRFS with its checksum (when used in raid
> mode) is able to *correct* the data too.

The decision is up to you. For me it was easy:

First: I do not care much if some of my logs become broken. I have backups 
anyway with a backlog several weeks long.

Second: I don't use data RAID (at least not yet), just data striping. So the 
btrfs checksum won't help me much - it has about the same meaning as the 
journal checksums.

YMMV... For me, nocow'ing the journal was the clear winner. If you persist 
on using cow for journal files, I'd stick with the defragger:

>> Back to the extents counts: What I did next was implementing a defrag job
>> that regularly defrags the journal (actually, the complete log directory
>> as other log files suffer the same problem):
>> 
>> $ cat /usr/local/sbin/defrag-logs.rb
>> #!/bin/sh
>> exec btrfs filesystem defragment -czlib -r /var/log
> 
> I think that this is a more viable solution; but what seems to me strange
> is the fact that the fragmentation level of my rsyslog file is a lot
> lower.

Are you sure? As Duncan pointed out at many occassions, if your journal file 
is compressed by btrfs (for whatever reason, depends on your mount options 
and defragging habbits), then filefrag does not show correct values. It 
shows the count of compression blocks. Only if you are sure about that, it 
is worth considering further steps.

If you are sure about the journals not being compressed (and the rsyslog 
file, too), you can compare values. I think, only then, you could say that 
systemd's allocation strategy for journal files does no good on btrfs. Maybe 
then it's time to fix that by patching systemd to use another strategy, 
either by configuration or auto-detection.

Most of those strategies will probably result in coalescing writes to the 
journal into bigger blocks, which means buffering, which in turn is usually 
not wanted for log files. That in turn means that cowed btrfs is probably 
the wrong medium for storing journal files. You may want to play with 
SyncIntervalSec in journald.conf...

BTW: I'm not sure if autodefrag would kick in on journal files because those 
appends are no random writes. I think I remember that autodefrag was about 
detecting random writes and then rewrite the file as a whole to reduce 
extent fragmentation. Read: I'm not sure if autodefrag would have an effect 
on this. A btrfs dev may comment on that.

-- 
Replies to list only preferred.