[systemd-devel] readahead and read_ahead_kb - reading in way too much data - a proposed solution

Kok, Auke-jan H auke-jan.h.kok at intel.com
Tue Apr 3 12:17:56 PDT 2012


All,

I've been comparing the readahead implementation that systemd has with
some of the code that Arjan and me maintained in a seperate readahead
implementation, to assure that we get adequate performance out of the
systemd version. We're not looking at small differences in the
implementation, but in large, the performance should be comparable.

A major obstacle we've identified is that the total readahead volume
(that is, the approximate size of stuff being listed in the readahead
pack) is significantly larger than it needs to be. The root cause for
this is that the default read_ahead_kb is set to 128kb.

What happens is that on first boot, when the pack file doesn't exist,
page faults are generated for all the libraries and files that are
needed to start up. The kernel sees the pages, and applies 128kb
read_ahead_kb on top of those. As a result, we will read 128kb more
for each file touched on top of every page needed, permitted that we
don't go beyond the end of the file. While this provides a reasonable
increase in speed on rotating media when booting without any readahead
acceleration, this is going to hurt later on.

Quick data summary: about 99% of all the files in the readahead pack
are read entirely, with 128kb read_ahead_kb. In reality, the amount
really needed is more in the 60%-70% region average.


Some data:

My test case is a light-weight desktop on an ultraboook. While the
system is using an SSD, we're looking at the volume, not boot time.

read_ahead_kb - resulting pack volume
128kb  -> 83mb
16kb -> 53mb
8kb -> 50mb


The modification to the read_ahead_kb is done by a static C binary
that runs in all cases and modifies the sysfs files so the comparison
is not skewed - even in the default case the static binary runs. After
the modification it exec()'s systemd as usual.


As you can see, the total RA volume is significantly decreased by
lowering the RA size before we boot, but, in general, we don't want to
keep it low at all - the speedup from readahead should be put back to
the default size at a minimum (*experimental - mostly done by kernel
developer saying "hey this kernel compile is a great benchmark for
testing a good RA size setting").

So, at a minimum we want to revert any lower RA setting back to the
default once we're done with readahead collection.

However, due to the usage of fadvise WILLNEED in the replay service,
in subsequent boots, we don't have a problem with a higher default RA
size at all - since we tell the kernel exactly which pages we need,
the kernel will unlikely see page faults for pages we have not
readahead - and so the RA volume stays low on subsequent boots.


The solution:

Given above data, the solution is relatively simple:

collector service:

if (readahead pack file does not exist) {
    - map block device to bdi/read_ahead_kb node
    - get the current read_ahead_kb
    - store this default somewhere
    - lower the read_ahead_kb to 16kb or 8kb
}

normal collector code...

if (have read a default read_ahead_kb) {
    - read the bdi/read_ahead_kb node
    - if (stored default == value we put in there earlier) {
            restore the original
       }
}


Attached are:
- a pack file dumper
- a quick hack to tune the reada_head_kb before booting systemd - make
sure you compile with -static


Comments? If the proposed solution seems agreed upon, I will implement
a patch that accomplishes the above procedure. Since the hard part is
mapping the sysfs nodes, I haven't done this just yet.

Auke
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pack-dump.c
Type: text/x-csrc
Size: 1757 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20120403/0f9023d6/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ratune.c
Type: text/x-csrc
Size: 437 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/systemd-devel/attachments/20120403/0f9023d6/attachment-0001.c>


More information about the systemd-devel mailing list