[systemd-devel] Improving module loading

Tue Dec 23 06:37:50 PST 2014

On Tue, Dec 23, 2014 at 1:21 PM, Hoyer, Marko (ADITG/SW2)
<mhoyer at de.adit-jv.com> wrote:
>> -----Original Message-----
>> From: Lucas De Marchi [mailto:lucas.de.marchi at gmail.com]
>> Sent: Monday, December 22, 2014 7:00 PM
>> To: Lennart Poettering
>> Cc: Hoyer, Marko (ADITG/SW2); systemd-devel at lists.freedesktop.org
>> Subject: Re: [systemd-devel] Improving module loading
>>
>> On Mon, Dec 22, 2014 at 1:04 PM, Lennart Poettering
>> <lennart at poettering.net> wrote:
>> > On Sat, 20.12.14 10:45, Hoyer, Marko (ADITG/SW2) (mhoyer at de.adit-
>> jv.com) wrote:
>> >
>> >> I had such a discussion earlier with some of the systemd guys. My
>> >> intention was to introduce an additional unit for module loading for
>> >> exactly the reason you mentioned. The following (reasonable) outcome
>> >> was:
>> >>
>> >> - It is dangerous to load kernel modules from PID 1 since module
>> >>   loading can get stuck
>> >
>> >> - Since modules are actually loaded with the thread that calls the
>> >>   syscall, systemd would need additional threads
>> >
>> >> - Multi Threading is not really aimed in systemd for stability
>> >> reasons
>> >>
>> >> The probably safest way to do what you intended is to use an
>> >> additional process to load your modules, which could be easily done
>> >> by using ExecStartPre= in a service file. We are doing it exactly
>> >> this way not with kmod but with a tool that loads modules in
>> >> parallel.
>> >
>> > I'd be willing to merge a good patch that beefs up
>> > systemd-modules-load to load the specified modules in parallel, with
>> > one thread for each.
>> >
>> > We already have a very limited number of threaded bits in systemd,
>> and
>> > I figure out would be OK to do that for this too.
>> >
>> > Please keep the threading minimal though, i.e. one kmod context per
>> > thread, so that we need no synchronization and no locking. One thread
>> > per module, i.e. no worker thread logic with thread reusing. also,
>> > please set a thred name, so that hanging module loading only hang one
>> > specific thread and the backtrace shows which module is at fault.
>>
>> I'm skeptical you would get any speed up for that. I think it would be
>> better to have some numbers shared before merging such a thing.
>>
>
> As I already outlined in my answer to Greg, the parallel loading was not our main motivation for inventing something new. We found that for some of our modules parallel loading gained us benefit, so we integrated this feature. Since we are not using udevd during startup at all, most of our modules are loaded manually. I've no idea how things are distributed between systemd-modules-load and udevd in conventional Linux desktop or server systems. If only a hand full of modules are actually loaded using systemd-modules-load, it is probably not worth optimizing at this end.
>
> Has someone concrete numbers how many modules are loaded "by hand" using systemd-modules-load in a conventional system?

In a stock Fedora/Arch (and probably others, but didn't check)
systemd-modules-load is not used at all. It is mostly there to make it
simple to work around sub-par kernel modules, but most have been fixed
by now, so it is increasingly irrelevant.

I agree that it is probably not worth doing lots of
systemd-modules-load-specific hacks to speed it up, but if we split
out the worker-pool logic from udev (which I'm currently working at as
we need it in more places), we can optimize that in a generic way and
if the numbers show that systemd-modules-load would benefit from using
it, I'd be all for hooking that up too (as doing so would then be
trivial).

>> If you have 1 context per module/thread you will need to initialize
>> each context which is really the most expensive part in userspace,
>> particularly if finit_module() is being used (which you should unless
>> you have restrictions on the physical size taken by the modules). Bare
>> in mind the udev logic has only 1 context, so the initialization is
>> amortized among the multiple module load calls.
>>
>
> This does not really meet my experience. Once the kmod binary cache is in the VFS page buffer cache, it is really fast getting a new context even in new processes. The expensive thing about udev is that it starts very fast forking off worker processes. So at least one new context per process is created finally too. Additionally, the people who decide to use systemd-modules-load to load specific modules have good reasons for that. A prominent one is probably that udevd is not working for the respective module because no concrete device is coupled with it. I think we do not have so many kernel modules, which need to be handled like this which brings us again to the question if it is really worth pimping systemd-modules-load.

I'm not aware of any kernel modules that legitimately needs to be
loaded in this way (i.e., all the ones that do can/should be fixed).

>> For the "don't load until it's needed" I very much prefer the static
>> nodes approach we have. Shouldn't this be used instead of filling
>> modules-load-d with lots of entries?
>
> We are not using systemd-modules-load for applying this approach since it is trying to load all modules in one shot. We are executing our tool several times during startup to get up hardware piece by piece exactly at the point where it is needed. The tool is either executed like modprobe or with a configuration file containing a set of modules to be loaded in one shot and some other stuff needed for synchronization and setup.

I'd be interested in understanding better the need for this
serialization. What bottleneck do you hit if you load modules eagerly?
CPU/IO? Could this be worked around by tweaking udev (limiting the
number of workers, fiddling with the order of triggering, tweaking
cgroup properties of the processes doing the loading,...)? It would be
nice to understand how a generic solution to this would look like so
we could consider if there is anything we want to improve in udev
here...

Cheers,

Tom