[systemd-devel] Antw: [EXT] Re: consider dropping defrag of journals on btrfs

Ulrich Windl Ulrich.Windl at rz.uni-regensburg.de
Mon Feb 8 08:24:22 UTC 2021


>>> Phillip Susi <phill at thesusis.net> schrieb am 05.02.2021 um 16:02 in
Nachricht
<87a6si5yjq.fsf at vps.thesusis.net>:

> Chris Murphy writes:
> 
>> But it gets worse. The way systemd‑journald is submitting the journals
>> for defragmentation is making them more fragmented than just leaving
>> them alone.
> 
> Wait, doesn't it just create a new file, fallocate the whole thing, copy
> the contents, and delete the original?  How can that possibly make
> fragmentation *worse*?
> 
>> All of those archived files have more fragments (post defrag) than
>> they had when they were active. And here is the FIEMAP for the 96MB
>> file which has 92 fragments.
> 
> How the heck did you end up with nearly 1 frag per mb?

I didn't follow the thread tightly, but there was a happy mix of IOps,
fragments (and no bandwidth),
but I wonder here: Isn't it concept of BtrFS that writes are fragmented if
there is no contiguous free space?
The idea was *not* to spend time trying to find a goot spoace to write to, but
use the next available one.

> 
>> If you want an optimization that's actually useful on Btrfs,
>> /var/log/journal/ could be a nested subvolume. That would prevent any

Actually I stil ldidn't get the benefit of a BtrFS subvolume, but that 's a
different topic:
Don't all wrtites end up in a single storage pool?

>> snapshots above from turning the nodatacow journals into datacow
>> journals, which does significantly increase fragmentation (it would in
>> the exact same case if it were a reflink copy on XFS for that matter).
> 
> Wouldn't that mean that when you take snapshots, they don't include the
> logs?  That seems like an anti feature that violates the principal of
> least surprise.  If I make a snapshot of my root, I *expect* it to
> contain my logs.
> 
>> I don't get the iops thing at all. What we care about in this case is
>> latency. A least noticeable latency of around 150ms seems reasonable
>> as a starting point, that's where users realize a delay between a key
>> press and a character appearing. However, if I check for 10ms latency
>> (using bcc‑tools fileslower) when reading all of the above journals at
>> once:
>>
>> $ sudo journalctl ‑D
>> /mnt/varlog33/journal/b51b4a725db84fd286dcf4a790a50a1d/ ‑‑no‑pager
>>
>> Not a single report. None. Nothing took even 10ms. And those journals
>> are more fragmented than your 20 in a 100MB file.
>>
>> I don't have any hard drives to test this on. This is what, 10% of the
>> market at this point? The best you can do there is the same as on SSD.
> 
> The above sounded like great data, but not if it was done on SSD.  Of
> course it doesn't cause latency on an SSD.  I don't know about market
> trends, but I stopped trusting my data to SSDs a few years ago when my
> ext4 fs kept being corrupted and it appeared that the FTL of the drive
> was randomly swapping the contents of different sectors around when I
> found things like the contents of a text file in a block of the inode
> table or a directory.
> 
>> You can't depend on sysfs to conditionally do defragmentation on only
>> rotational media, too many fragile media claim to be rotating.

Probably to keep software from breaking... ;-)

> 
> It sounds like you are arguing that it is better to do the wrong thing
> on all SSDs rather than do the right thing on ones that aren't broken.
> 
>> Looking at the two original commits, I think they were always in
>> conflict with each other, happening within months of each other. They
>> are independent ways of dealing with the same problem, where only one
>> of them is needed. And the best of the two is fallocate+nodatacow
>> which makes the journals behave the same as on ext4 where you also
>> don't do defragmentation.
> 
> This makes sense.
> _______________________________________________
> systemd‑devel mailing list
> systemd‑devel at lists.freedesktop.org 
> https://lists.freedesktop.org/mailman/listinfo/systemd‑devel 





More information about the systemd-devel mailing list