[systemd-devel] Improve boot-time of systemd-based device, revisited

Fri Jun 19 00:34:17 PDT 2015

cee1 <fykcee1 at gmail.com> writes:
> 2. Add a kernel sockopt for AF_UNIX to increase the maximum datagram
> queue length for SOCK_DGRAM sockets.

ceel, are you aware of the (hopefully) pending full merge of kdbus in 
kernel 4.2?   And that it is essentially a bottoms-up redesign of IPC 
that supports the existing D-Bus API?

> 3. Since boot-up tends to be IO bound, some IO optimization:

That's not true in many systems, of course, notably the ones that need 
network to come up, as discussed previously.

> 3.1 "consider disabling readahead collection in the shipped devices,
> but leave readahead replay enabled."

ceel, are you aware that readahead is deprecated in systemd and has not 
been included since about release 216?   Some of us in automotive are 
still working on it.   I have some patches here

https://github.com/chaiken/systemd-hacks/tree/packfilelist

against 215 that add various features.   We may soon be forward-porting 
these, along with readahead itself, to the latest version.

> The readahead doesn't work very well on my experiment,

I spent considerable time performing boot experiments on production 
hardware, including trying different I/O schedulers.    My conclusion 
was that readahead provides benefits in boot-time only when large, 
monolithic binaries start.     If these gigantic applications were 
rearchitected to be more modular and could load libraries dynamically 
when needed instead of all at once, I suspect that the speedup 
associated with readahead would vanish.   Nonetheless, under the right 
conditions, readahead may speed up boot on real hardware in 
product-relevant conditions.

The problem is actually quite complex in the case of eMMC boot devices, 
which have their own sophisticated embedded controllers.   To properly 
optimize the whole system, we need to know the behavior of that 
controller and model what happens at boot in the full system using 
different Linux I/O schedulers and readahead strategies.   Unfortunately 
we don't have all that information.   My suspicion is that we might 
actually boot faster from raw NAND flash, but then of course we have to 
perform our own wear-levelling and block sparing.

> The replaying sequence: A, B, C
> The actual requesting sequence: C, B, A
> If we can figure out the requesting sequence, it can achieve real read 
> "ahead"[1].

I have verified in detail that readahead worked as intended: the degree 
to which the system was I/O-bound did decrease, even in cases where 
there was no net speedup.
> 
> 4. Get rid of systemd-cgroups-agent. This requires introduction of a
> new kernel interface to get notifications for cgroups running empty,
> for example via fanotify() on cgroupfs.
> Is there any related work in processing?

Are you aware of "JoinControllers"?   You appear to have old versions of 
software, which doesn't garner much sympathy from developers.

> These makes it hard to use systemd in a customized system.

The Linux services business benefits from churn in userspace code . . .

> What I call
> for is to make the cold boot logic "declarative", something like:
> main.c:
> log_to_A
> mount_X
> mount_Y

Good news: you are free to choose SysVInit.

> I wonder whether a property system also makes sense in systemd's world?

systemd unit files are already declarative lists of properties, right?

Best wishes,
Alison

---
"Tunable parameter values found on the Internet . . . [are] akin to 
raiding someone else's medicine cabinet . . . "  -- Brendan Gregg, 
_Systems Performance_, p.23
http://brendangregg.com/linuxperf.html