[systemd-devel] journal fragmentation on Btrfs

Sun Apr 16 20:30:45 UTC 2017

Hi,

This is on a Fedora 26 workstation (systemd-233-3.fc26.x86_64) that's
maybe a couple weeks old and was clean installed. Drive is NVMe.

# filefrag *
system.journal: 9283 extents found
user-1000.journal: 3437 extents found
# lsattr
----------------C-- ./system.journal
----------------C-- ./user-1000.journal

I do manual snapshots before software updates, which means new writes
to these files are subject to COW, but additional writes to the same
extents are overwrites and are not COW because of chattr +C. I've used
this same strategy for a long time, since systemd-journald defaults to
+C for journal files; but I've not seen them get this fragmented this
quickly.

Meanwhile on a Fedora 25 Server, which has systemd-231-14.fc25.x86_64,
and SD Card based, I've made a modification where /var/log is a nested
subvolume so that when I snapshot the root subvolume, the contents of
/var/log are not snapshot, therefore these files should always be
no-COW, and yet they too are rather fragmented.

# filefrag *
system at 00054c130c57bb79-5df6c2871d1edf1e.journal~: 1 extent found
system at 00054cb3cd18d71b-6a815220d62cc6ea.journal~: 1 extent found
system at 01b44589014542e3b48df31f152c0916-0000000000000001-000542e1fb4550e7.journal:
1 extent found
system at 01b44589014542e3b48df31f152c0916-000000000000ca2b-00054546539416e8.journal:
1 extent found
system at 01b44589014542e3b48df31f152c0916-00000000000198f3-000547aac217c85b.journal:
1 extent found
system.journal: 2992 extents found
user-1000 at 00054c130a314ee9-4bb9fd0a9268dc1c.journal~: 1 extent found
user-1000 at ac4b2e5ded7d4e0dbcac6fc45430c857-00000000000005a9-000542e1fe209094.journal:
1 extent found
user-1000 at ac4b2e5ded7d4e0dbcac6fc45430c857-000000000000cafe-0005454b13a0349f.journal:
1 extent found
user-1000 at ac4b2e5ded7d4e0dbcac6fc45430c857-000000000001abe0-0005482397f286a5.journal:
1 extent found
user-1000.journal: 405 extents found

There are many 4096 byte extents is what's going on. Maybe this is a
consequence of frequent fsync?

On the plus side, even a 'reboot -f' or forced power off, and I get
pretty much everything within the last few seconds in the journal on
the next boot. That's pretty good. Maybe to do better is too much
hassle - like no fsyncing on Btrfs and just let its normal 30s commit
time apply; if things start crashing then journald could start
fsyncing... some sort of dynamic trigger.

There could be 8000 things higher priority than this though, this isn't broken.

Output from
# filefrag -v system.journal
# btrfs-debugfs -f system.journal

https://drive.google.com/open?id=0B_2Asp8DGjJ9UEdyVFRfU0c2V2s

-- 
Chris Murphy