[systemd-devel] journal fragmentation on Btrfs

Andrei Borzenkov arvidjaar at gmail.com
Tue Apr 18 04:05:53 UTC 2017


18.04.2017 06:50, Chris Murphy пишет:
> On Mon, Apr 17, 2017 at 9:42 PM, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
>> 17.04.2017 22:49, Chris Murphy пишет:
>>> On Mon, Apr 17, 2017 at 11:27 AM, Andrei Borzenkov <arvidjaar at gmail.com> wrote:
>>>> 17.04.2017 19:25, Chris Murphy пишет:
>>>>> This explains one system's fragmented journals; but the other system
>>>>> isn't snapshotting journals and I haven't figured out why they're so
>>>>> fragmented. No snapshots, and they are all +C at create time
>>>>> (systemd-journald default on Btrfs). Is it possible to prevent
>>>>> journald from setting +C on /var/log/journal and
>>>>> /var/log/journal/<machineid>? If I remove them, at next boot they get
>>>>> reset, so any new journals created inherit that.
>>>>>
>>>>
>>>> Yes, should be possible by creating empty
>>>> /etc/tmpfiles.d/journal-nocow.conf.
>>>
>>> OK super.
>>>
>>> How about inhibiting the defragmentation on rotate? I'm suspicious one
>>> of the things I'm seeing is due to ssd optimization mount options, but
>>> I need to see the predefrag state of the files.
>>>
>>> Why do I see so many changes to the journal file, once ever 2-5
>>> seconds? This adds 4096 byte blocks to the file each time, and when
>>> cow, that'd explain why there are so many fragments.
>>>
>>
>>
>> What exactly "changes" mean? Write() syscall?
> 
> filefrag reported entries increase, it's using FIEMAP.
> 

So far it sounds like btrfs allocates new extent on every write to
journal file. Each journal record itself is relatively small indeed.

> Also with stat I see the times (all three) change on the file. If I go
> to GNOME Terminal and just sudo some command, that itself causes the
> current system.journal file to get all three times modified. It
> happens immediately, there's no delay. So if I'm doing something like
> drm.debug=0x1e, which is spitting a bunch of stuff to dmesg and thus
> the journal, it's just constantly writing stuff to the journal. This
> is without anything running journalctl -f or reading the journal.
> 
>>
>>> #Storage=auto
>>> #Compress=yes
>>> #Seal=yes
>>> #SplitMode=uid
>>> #SyncIntervalSec=5m
>>
>> This controls how often systemd calls fsync() on currently active
>> journal file. Do you see fsync() every 3 seconds?
> 
> I have no idea if it's fsync or what. How can I tell?
> 

strace -p $(pgrep systemd-journal)

You will not see actual writes as file is memory mapped, but it
definitely does not do any fsync() every so often.

Is it possible that btrfs behavior you observe is specific to memory
mapped files handling?

> Also, I don't think these journal files are being compressed.
> 
> Using the btrfs-progs/btrfs-debugfs script on a few user journal
> files, I'm seeing massive compression ratios. Maybe I'll try
> Compress=No and see if there's a change.
> 

Only actual message payload above some threshold (I think 256 or 512
bytes, not sure) is compressed; everything else is not. For average
syslog-type messages payload is far too small. This is really only
interesting when you store core dump or similar.

> file: user-1000 at 6532e07ad7104b1c94d26a5b0fb2ad6e-0000000000059b73-00054d51b3f442ff.journal
> extents 64 disk size 294912 logical size 8388608 ratio 28.44
> file: user-1000 at 6532e07ad7104b1c94d26a5b0fb2ad6e-000000000002ec5b-00054d4ebb7114e7.journal
> extents 64 disk size 278528 logical size 8388608 ratio 30.12
> file: user-1000 at 6532e07ad7104b1c94d26a5b0fb2ad6e-00000000000006e5-00054c3c32607483.journal
> extents 320 disk size 5206016 logical size 41943040 ratio 8.06
> 



More information about the systemd-devel mailing list