[systemd-devel] consider dropping defrag of journals on btrfs
Vito Caputo
vcaputo at pengaru.com
Sat Feb 6 22:02:06 UTC 2021
On Fri, Feb 05, 2021 at 05:44:03PM -0700, Chris Murphy wrote:
> On Fri, Feb 5, 2021 at 3:55 PM Lennart Poettering
> <lennart at poettering.net> wrote:
> >
> > On Fr, 05.02.21 20:58, Maksim Fomin (maxim at fomin.one) wrote:
> >
> > > > You know, we issue the btrfs ioctl, under the assumption that if the
> > > > file is already perfectly defragmented it's a NOP. Are you suggesting
> > > > it isn't a NOP in that case?
> > >
> > > So, what is the reason for defragmenting journal is BTRFS is
> > > detected? This does not happen at other filesystems. I have read
> > > this thread but has not found a clear answer to this question.
> >
> > btrfs like any file system fragments files with nocow a bit. Without
> > nocow (i.e. with cow) it fragments files horribly, given our write
> > pattern (wich is: append something to the end, and update a few
> > pointers in the beginning). By upstream default we set nocow, some
> > downstreams/users undo that however. (this is done via tmpfiles,
> > i.e. journald doesn't actually set nocow ever).
>
> I don't see why it's upstream's problem to solve downstream decisions.
> If they want to (re)enable datacow, then they can also setup some kind
> of service to defragment /var/log/journal/ on a schedule, or they can
> use autodefrag.
>
It seems cooperative to me that applications advise the filesystem on
appropriate optimization opportunities.
Taking a step back and looking at what journald is doing, how and when
these journal files are accessed, it doesn't strike me as illogical
to tell the fs when archiving it's a good time to defragment the file.
>
> > When we archive a journal file (i.e stop writing to it) we know it
> > will never receive any further writes. It's a good time to undo the
> > fragmentation (we make no distinction whether heavily fragmented,
> > little fragmented or not at all fragmented on this) and thus for the
> > future make access behaviour better, given that we'll still access the
> > file regularly (because archiving in journald doesn't mean we stop
> > reading it, it just means we stop writing it — journalctl always
> > operates on the full data set). defragmentation happens in the bg once
> > triggered, it's a simple ioctl you can invoke on a file. if the file
> > is not fragmented it shouldn't do anything.
>
> ioctl(3, BTRFS_IOC_DEFRAG_RANGE, {start=0, len=16777216, flags=0,
> extent_thresh=33554432, compress_type=BTRFS_COMPRESS_NONE}) = 0
>
> What 'len' value does journald use?
>
journald uses BTRFS_IOC_DEFRAG, there is no range argument; it's the
whole file.
I'm inclined to agree with Lennart on this looking more like a btrfs
issue than journald issue, based on your claims.
journald is arguably Doing The Right Thing by advising btrfs of a
defrag opportunity. If btrfs can't usefully defragment the file vs.
its layout, it should NOOP the ioctl. If it's producing more
fragmented files post-defrag, how is that not a btrfs bug?
Some things I didn't see being considered in your comparisons is
filesystem free space, age, and concurrent use.
If your comparisons are on fresh filesystems, fragmentation tends to
be much lower as the business of finding contiguous blocks of free
space is trivial. Once the filesystem has aged enough to churn
through the available space, fragmentation increases substantially.
When journald is the only writer on an otherwise idle filesystem, it's
less likely to have its allocations interrupted by allocations to
other writers.
To make meaningful measurements of fragmentation and the necessity of
telling the fs "hey, now's a good time to defrag this file I'm no
longer going to write to", you need to look at more worst case
scenarios, not best case.
On a different note, I feel like there's an unnecessarily combative
tone to this discussion. Maybe it's just me, but it deterred me from
participating up until this point.
Regards,
Vito Caputo
More information about the systemd-devel
mailing list