[systemd-devel] readahead and read_ahead_kb - reading in way too much data - a proposed solution

Kok, Auke-jan H auke-jan.h.kok at intel.com
Tue Apr 3 14:19:48 PDT 2012


On Tue, Apr 3, 2012 at 1:50 PM, Lennart Poettering
<lennart at poettering.net> wrote:
> On Tue, 03.04.12 12:17, Kok, Auke-jan H (auke-jan.h.kok at intel.com) wrote:
>
> Heya,
>
>> The modification to the read_ahead_kb is done by a static C binary
>> that runs in all cases and modifies the sysfs files so the comparison
>> is not skewed - even in the default case the static binary runs. After
>> the modification it exec()'s systemd as usual.
>>
>> As you can see, the total RA volume is significantly decreased by
>> lowering the RA size before we boot, but, in general, we don't want to
>> keep it low at all - the speedup from readahead should be put back to
>> the default size at a minimum (*experimental - mostly done by kernel
>> developer saying "hey this kernel compile is a great benchmark for
>> testing a good RA size setting").
>>
>> So, at a minimum we want to revert any lower RA setting back to the
>> default once we're done with readahead collection.
>>
>> However, due to the usage of fadvise WILLNEED in the replay service,
>> in subsequent boots, we don't have a problem with a higher default RA
>> size at all - since we tell the kernel exactly which pages we need,
>> the kernel will unlikely see page faults for pages we have not
>> readahead - and so the RA volume stays low on subsequent boots.
>
> Ah, OK, makes a ton of sense. When you first mentioned this on IRC I got
> the impression that WILLNEED was also subject to RA, but it actually
> isn't, it's just that during the initial collection we don't want
> RA. Got it.
>
>> Given above data, the solution is relatively simple:
>>
>> collector service:
>>
>> if (readahead pack file does not exist) {
>>     - map block device to bdi/read_ahead_kb node
>>     - get the current read_ahead_kb
>>     - store this default somewhere
>>     - lower the read_ahead_kb to 16kb or 8kb
>> }
>>
>> normal collector code...
>>
>> if (have read a default read_ahead_kb) {
>>     - read the bdi/read_ahead_kb node
>>     - if (stored default == value we put in there earlier) {
>>             restore the original
>>        }
>> }
>
> Sounds good.
>
> The tuning thing should be easy to do, similar to the code that is
> already in bump_request_nr().

ah, nice, I was hoping that I could use something like that without
having to reinvent much code.

>> Attached are:
>> - a pack file dumper
>
> I think it might make a lot of sense to include this in systemd. Might
> make sense to rename it to src/readahead/test-readahead-dump.c or so...

that was mainly the intent - even just in source form or non-installed
it's valuable.

>> Comments? If the proposed solution seems agreed upon, I will implement
>> a patch that accomplishes the above procedure. Since the hard part is
>> mapping the sysfs nodes, I haven't done this just yet.
>
> Yes, I am sold. Happy to take a patch.

alright then, I'll start working on a mergable patch.

Auke


More information about the systemd-devel mailing list