[systemd-devel] Slow startup of systemd-journal on BTRFS

Lennart Poettering lennart at poettering.net
Sun Jun 15 15:13:07 PDT 2014


On Sat, 14.06.14 09:52, Goffredo Baroncelli (kreijack at libero.it) wrote:

> > Which effectively means that by the time the 8 MiB is filled, each 4 KiB 
> > block has been rewritten to a new location and is now an extent unto 
> > itself.  So now that 8 MiB is composed of 2048 new extents, each one a 
> > single 4 KiB block in size.
> 
> Several people pointed fallocate as the problem. But I don't
> understand the reason.

BTW, the reason we use fallocate() in journald is not about trying to
optimize anything. It's only used for one reason: to avoid SIGBUS on
disk/quota full, since we actually write everything to the files using
mmap(). I mean, writing things with mmap() is always problematic, and
handling write errors is awfully difficult, but at least two of the most
common reasons for failure we'd like protect against in advance, under
the assumption that disk/quota full will be reported immediately by the
fallocate(), and the mmap writes later on will then necessarily succeed.

I am not really following though why this trips up btrfs though. I am
not sure I understand why this breaks btrfs COW behaviour. I mean,
fallocate() isn't necessarily supposed to write anything really, it's
mostly about allocating disk space in advance. I would claim that
journald's usage of it is very much within the entire reason why it
exists...

Anyway, happy to change these things around if necesary, but first I'd
like to have a very good explanation why fallocate() wouldn't be the
right thing to invoke here, and a suggestion what we should do instead
to cover this usecase...

Lennart

-- 
Lennart Poettering, Red Hat


More information about the systemd-devel mailing list