[systemd-devel] readahead and read_ahead_kb - reading in way too much data - a proposed solution
Lennart Poettering
lennart at poettering.net
Tue Apr 3 13:50:31 PDT 2012
On Tue, 03.04.12 12:17, Kok, Auke-jan H (auke-jan.h.kok at intel.com) wrote:
Heya,
> The modification to the read_ahead_kb is done by a static C binary
> that runs in all cases and modifies the sysfs files so the comparison
> is not skewed - even in the default case the static binary runs. After
> the modification it exec()'s systemd as usual.
>
> As you can see, the total RA volume is significantly decreased by
> lowering the RA size before we boot, but, in general, we don't want to
> keep it low at all - the speedup from readahead should be put back to
> the default size at a minimum (*experimental - mostly done by kernel
> developer saying "hey this kernel compile is a great benchmark for
> testing a good RA size setting").
>
> So, at a minimum we want to revert any lower RA setting back to the
> default once we're done with readahead collection.
>
> However, due to the usage of fadvise WILLNEED in the replay service,
> in subsequent boots, we don't have a problem with a higher default RA
> size at all - since we tell the kernel exactly which pages we need,
> the kernel will unlikely see page faults for pages we have not
> readahead - and so the RA volume stays low on subsequent boots.
Ah, OK, makes a ton of sense. When you first mentioned this on IRC I got
the impression that WILLNEED was also subject to RA, but it actually
isn't, it's just that during the initial collection we don't want
RA. Got it.
> Given above data, the solution is relatively simple:
>
> collector service:
>
> if (readahead pack file does not exist) {
> - map block device to bdi/read_ahead_kb node
> - get the current read_ahead_kb
> - store this default somewhere
> - lower the read_ahead_kb to 16kb or 8kb
> }
>
> normal collector code...
>
> if (have read a default read_ahead_kb) {
> - read the bdi/read_ahead_kb node
> - if (stored default == value we put in there earlier) {
> restore the original
> }
> }
Sounds good.
The tuning thing should be easy to do, similar to the code that is
already in bump_request_nr().
> Attached are:
> - a pack file dumper
I think it might make a lot of sense to include this in systemd. Might
make sense to rename it to src/readahead/test-readahead-dump.c or so...
> Comments? If the proposed solution seems agreed upon, I will implement
> a patch that accomplishes the above procedure. Since the hard part is
> mapping the sysfs nodes, I haven't done this just yet.
Yes, I am sold. Happy to take a patch.
Lennart
--
Lennart Poettering - Red Hat, Inc.
More information about the systemd-devel
mailing list