[systemd-devel] readahead and read_ahead_kb - reading in way too much data - a proposed solution

Lennart Poettering lennart at poettering.net
Tue Apr 3 13:50:31 PDT 2012


On Tue, 03.04.12 12:17, Kok, Auke-jan H (auke-jan.h.kok at intel.com) wrote:

Heya,

> The modification to the read_ahead_kb is done by a static C binary
> that runs in all cases and modifies the sysfs files so the comparison
> is not skewed - even in the default case the static binary runs. After
> the modification it exec()'s systemd as usual.
> 
> As you can see, the total RA volume is significantly decreased by
> lowering the RA size before we boot, but, in general, we don't want to
> keep it low at all - the speedup from readahead should be put back to
> the default size at a minimum (*experimental - mostly done by kernel
> developer saying "hey this kernel compile is a great benchmark for
> testing a good RA size setting").
> 
> So, at a minimum we want to revert any lower RA setting back to the
> default once we're done with readahead collection.
> 
> However, due to the usage of fadvise WILLNEED in the replay service,
> in subsequent boots, we don't have a problem with a higher default RA
> size at all - since we tell the kernel exactly which pages we need,
> the kernel will unlikely see page faults for pages we have not
> readahead - and so the RA volume stays low on subsequent boots.

Ah, OK, makes a ton of sense. When you first mentioned this on IRC I got
the impression that WILLNEED was also subject to RA, but it actually
isn't, it's just that during the initial collection we don't want
RA. Got it.

> Given above data, the solution is relatively simple:
> 
> collector service:
> 
> if (readahead pack file does not exist) {
>     - map block device to bdi/read_ahead_kb node
>     - get the current read_ahead_kb
>     - store this default somewhere
>     - lower the read_ahead_kb to 16kb or 8kb
> }
> 
> normal collector code...
> 
> if (have read a default read_ahead_kb) {
>     - read the bdi/read_ahead_kb node
>     - if (stored default == value we put in there earlier) {
>             restore the original
>        }
> }

Sounds good.

The tuning thing should be easy to do, similar to the code that is
already in bump_request_nr().

> Attached are:
> - a pack file dumper

I think it might make a lot of sense to include this in systemd. Might
make sense to rename it to src/readahead/test-readahead-dump.c or so...

> Comments? If the proposed solution seems agreed upon, I will implement
> a patch that accomplishes the above procedure. Since the hard part is
> mapping the sysfs nodes, I haven't done this just yet.

Yes, I am sold. Happy to take a patch.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.


More information about the systemd-devel mailing list