[systemd-devel] Journald Scalability
matt at 0x01b.net
Mon Sep 10 14:30:34 PDT 2012
On 09/10/2012 02:24 PM, Lennart Poettering wrote:
> On Mon, 10.09.12 15:56, Roland Schatz (roland.schatz at students.jku.at) wrote:
>> On 10.09.2012 09:57, Lennart Poettering wrote:
>>> Well, I am not aware of anybody having done measurements recently
>>> about this. But I am not aware of anybody running into scalability
>>> issues so far.
>> I'm able to share a data point here, see attachment.
>> TLDR: Outputting the last ten journal entries takes 30 seconds on my server.
>> I have not reported this so far, because I'm not really sure whom to
>> blame. Current suspects include the journal, btrfs and hyper-v ;)
>> Some details about my setup: I'm running Arch Linux on a virtual server,
>> running on hyper-v on some windows host (outside of my control). I'm
>> currently using systemd 189, but the journal files are much older.
>> journald.conf is empty (everything commented, i.e., the default). The
>> journal logging was activated in February (note the strange first output
>> line that says the journal ends on 28 Feb, while still containing
>> entries up to right now). Since then, I have not removed/archived any
>> journal files from the system.
>> After issuing journalctl a few times, time goes down significantly, ever
>> for larger values of -n (e.g. first try -n10 takes 30 secs, second -n10
>> takes 18 secs, third -n10 takes 0.2 secs, after that, even -n100 takes
>> 0.2 secs, -n500 takes 0.8 secs and so on). Rebooting or simply waiting a
>> day or so makes it slow again.
>> Btrfs and fragmentation may be an issue: Defragmenting the journal files
>> seems to make things better. But its hard to be sure whether this is
>> really the problem, because defrag could just be pulling the journal
>> files into the fs cache, therefore having a similar effect than the
>> repeated journalctl...
>> I'm not able to reproduce the problem by copying the whole journal to
>> another system and running journalctl there. I'm also seeing the getting
>> faster effect, but it starts at 2 secs and then goes down to 0.2 secs.
>> Also, I'm not seeing any difference between btrfs and ext4, so maybe
>> really fragmentation is the issue, although I don't really understand
>> how a day of logging could fragment the log files that badly, even on a
>> COW filesystem. Yes, there still is the indirection of the virtual
>> drive, but I have a fixed-size disk file residing on an ntfs drive, so
>> there shouldn't be any noticable additional fragmentation coming from
>> that setup.
>> I'm not sure what I can do investigate this further. For me this is a
>> low priority problem, since the system is running stable and the problem
>> goes away after a few journalctl runs. But if you have anything you'd
>> like me to try, I'd be happy to assist.
> Hmm, these are definitely weird results. A few notes:
> Appending things to the journal is done by adding a few data objects to
> the end of the file and then updating a couple of pointers in the
> front. This is not an ideal access pattern on rotating media but my
> educated guess would be that this is not your problem (not your only one
> at least...), as the respective tables are few and should be in memory
> quickly (that said, I didn't do any precise IO pattern measurements for
> this). If the access pattern turns out to be too bad we could certainly
> improve it (for example by delay-writing the updated pointers).
> Now, I am not sure what such an access pattern means to COW file systems
> such as btrfs. Might be worth playing around with COW for that, for
> example by removing the COW flag from the generated files (via
> FS_NOCOW_FL in FS_IOC_SETFLAGS which you need to start before the first
> write to the file, i.e. you need to patch journald for that).
> So much about the write access pattern. Most likely that's not impacting
> you much anyway, but the read access pattern is. And that's much more
> chaotic usually (and hence slow on rotating disks) than the write access
> pattern, as we write journal fields only once and the reference them by
> all entries using them. Which might mean that we end up jumping around
> on disk for each entry we try to read as we iterate through its
> fields. But this could be improved easily too, for example, as the order
> of fields is undefined we could just order them by offset, so that we
> read them by their order on disk.
> It would be really good getting some hard data about access patterns
> before we optimize things, though...
How can users provide meaningful data?
I've been experiencing the same problem. systemd 189, btrfs, Arch Linux, very
slow response when trying to view the journal (which includes systemctl status
<service>). I wish I was positive about this, but the slowness might have
started when I enabled FSS.
More information about the systemd-devel