[systemd-devel] consider dropping defrag of journals on btrfs

Tue Feb 9 15:17:31 UTC 2021

Chris Murphy writes:

> And I agree 8MB isn't a big deal. Does anyone complain about journal
> fragmentation on ext4 or xfs? If not, then we come full circle to my
> second email in the thread which is don't defragment when nodatacow,
> only defragment when datacow. Or use BTRFS_IOC_DEFRAG_RANGE and
> specify 8MB length. That does seem to consistently no op on nodatacow
> journals which have 8MB extents.

Ok, I agree there.

> The reason I'm dismissive is because the nodatacow fragment case is
> the same as ext4 and XFS; the datacow fragment case is both
> spectacular and non-deterministic. The workload will matter where

Your argument seems to be that it's no worse than ext4 and so if we
don't defrag there, why on btrfs?  Lennart seems to be arguing that the
only reason systemd doesn't defrag on ext4 is because the ioctl is
harder to use.  Maybe it should defrag as well, so he's asking for
actual performance data to evaluate whether the defrag is pointless or
whether maybe ext4 should also start doing a defrag.  At least I think
that's his point.  Personally I agree ( and showed the calculations in a
previous post ) that 8 MB/fragment is only going to have a negligiable
impact on performance and so isn't worth bothering with a defrag, but he
has asked for real world data...

> And also, only defragmenting on rotation strikes me as leaving
> performance on the table, right? If there is concern about fragmented

No, because fragmentation only causes additional latency on HDD, not SSD.

> But it sounds to me like you want to learn what the performance is of
> journals defragmented with BTFS_IOC_DEFRAG specifically? I don't think
> it's interesting because you're still better off leaving nodatacow
> journals alone, and something still has to be done in the datacow

Except that you're not.  Your definition of better off appears to be
only on SSD and only because it is preferable to have fewer writes than
less fragmentation.  On HDD defragmenting is a good thing.  Lennart
seems to want real world performance data to evaluate just *how* good
and whether it's worth the bother, at least for HDD.  For SSDs, I
believe he agreed that it may as well be shut off there since it
provides no benefit, but your patch kills it on HDDs as well.

> Is there a test mode for journald to just dump a bunch of random stuff
> into the journal to age it? I don't want to wait weeks to get a dozen
> journal files.

The cause of the fragmentation is slowly appending to the file over
time, so if you dump a bunch of data in too quickly you would eliminate
the fragmentation.  You might try:

while true ; do logger "This is a test log message to act as filler" ;
sleep 1 ; done

To speed things up a little bit.