[systemd-devel] Antw: [EXT] Re: consider dropping defrag of journals on btrfs
Ulrich Windl
Ulrich.Windl at rz.uni-regensburg.de
Mon Feb 8 08:24:22 UTC 2021
>>> Phillip Susi <phill at thesusis.net> schrieb am 05.02.2021 um 16:02 in
Nachricht
<87a6si5yjq.fsf at vps.thesusis.net>:
> Chris Murphy writes:
>
>> But it gets worse. The way systemd‑journald is submitting the journals
>> for defragmentation is making them more fragmented than just leaving
>> them alone.
>
> Wait, doesn't it just create a new file, fallocate the whole thing, copy
> the contents, and delete the original? How can that possibly make
> fragmentation *worse*?
>
>> All of those archived files have more fragments (post defrag) than
>> they had when they were active. And here is the FIEMAP for the 96MB
>> file which has 92 fragments.
>
> How the heck did you end up with nearly 1 frag per mb?
I didn't follow the thread tightly, but there was a happy mix of IOps,
fragments (and no bandwidth),
but I wonder here: Isn't it concept of BtrFS that writes are fragmented if
there is no contiguous free space?
The idea was *not* to spend time trying to find a goot spoace to write to, but
use the next available one.
>
>> If you want an optimization that's actually useful on Btrfs,
>> /var/log/journal/ could be a nested subvolume. That would prevent any
Actually I stil ldidn't get the benefit of a BtrFS subvolume, but that 's a
different topic:
Don't all wrtites end up in a single storage pool?
>> snapshots above from turning the nodatacow journals into datacow
>> journals, which does significantly increase fragmentation (it would in
>> the exact same case if it were a reflink copy on XFS for that matter).
>
> Wouldn't that mean that when you take snapshots, they don't include the
> logs? That seems like an anti feature that violates the principal of
> least surprise. If I make a snapshot of my root, I *expect* it to
> contain my logs.
>
>> I don't get the iops thing at all. What we care about in this case is
>> latency. A least noticeable latency of around 150ms seems reasonable
>> as a starting point, that's where users realize a delay between a key
>> press and a character appearing. However, if I check for 10ms latency
>> (using bcc‑tools fileslower) when reading all of the above journals at
>> once:
>>
>> $ sudo journalctl ‑D
>> /mnt/varlog33/journal/b51b4a725db84fd286dcf4a790a50a1d/ ‑‑no‑pager
>>
>> Not a single report. None. Nothing took even 10ms. And those journals
>> are more fragmented than your 20 in a 100MB file.
>>
>> I don't have any hard drives to test this on. This is what, 10% of the
>> market at this point? The best you can do there is the same as on SSD.
>
> The above sounded like great data, but not if it was done on SSD. Of
> course it doesn't cause latency on an SSD. I don't know about market
> trends, but I stopped trusting my data to SSDs a few years ago when my
> ext4 fs kept being corrupted and it appeared that the FTL of the drive
> was randomly swapping the contents of different sectors around when I
> found things like the contents of a text file in a block of the inode
> table or a directory.
>
>> You can't depend on sysfs to conditionally do defragmentation on only
>> rotational media, too many fragile media claim to be rotating.
Probably to keep software from breaking... ;-)
>
> It sounds like you are arguing that it is better to do the wrong thing
> on all SSDs rather than do the right thing on ones that aren't broken.
>
>> Looking at the two original commits, I think they were always in
>> conflict with each other, happening within months of each other. They
>> are independent ways of dealing with the same problem, where only one
>> of them is needed. And the best of the two is fallocate+nodatacow
>> which makes the journals behave the same as on ext4 where you also
>> don't do defragmentation.
>
> This makes sense.
> _______________________________________________
> systemd‑devel mailing list
> systemd‑devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/systemd‑devel
More information about the systemd-devel
mailing list