[systemd-devel] consider dropping defrag of journals on btrfs

Thu Feb 4 13:53:01 UTC 2021

On Mi, 03.02.21 23:11, Chris Murphy (lists at colorremedies.com) wrote:

> On Wed, Feb 3, 2021 at 9:46 AM Lennart Poettering
> <lennart at poettering.net> wrote:
> >
> > Performance is terrible if cow is used on journal files while we write
> > them.
>
> I've done it for a year on NVMe. The latency is so low, it doesn't
> matter.

Maybe do it on rotating media...

> > It would be great if we could turn datacow back on once the files are
> > archived, and then take benefit of compression/checksumming and
> > stuff. not sure if there's any sane API for that in btrfs besides
> > rewriting the whole file, though. Anyone knows?
>
> A compressed file results in a completely different encoding and
> extent size, so it's a complete rewrite of the whole file, regardless
> of the cow/nocow status.
>
> Without compression it'd be a rewrite because in effect it's a
> different extent type that comes with checksums. i.e. a reflink copy
> of a nodatacow file can only be a nodatacow file; a reflink copy of a
> datacow file can only be a datacow file. The conversion between them
> is basically 'cp --reflink=never' and you get a complete rewrite.
>
> But you get a complete rewrite of extents by submitting for
> defragmentation too, depending on the target extent size.
>
> It is possible to do what you want by no longer setting nodatacow on
> the enclosing dir. Create a 0 length journal file, set nodatacow on
> that file, then fallocate it. That gets you a nodatacow active
> journal. And then you can just duplicate it in place with a new name,
> and the result will be datacow and automatically compressed if
> compression is enabled.
>
> But the write hit has already happened by writing journal data into
> this journal file during its lifetime. Just rename it on rotate.
> That's the least IO impact possible at this point. Defragmenting it
> means even more writes, and not much of a gain if any, unless it's
> datacow which isn't the journald default.

You are focussing only on the one-time iops generated during archival,
and are ignoring the extra latency during access that fragmented files
cost. Show me that the iops reduction during the one-time operation
matters and the extra latency during access doesn't matter and we can
look into making changes. But without anything resembling any form of
profiling we are just blind people in the fog...

Lennart

--
Lennart Poettering, Berlin