[systemd-devel] Start-up resource and prioritization control

Tue May 20 06:16:50 PDT 2014

On Tue, May 20, 2014 at 1:46 PM, Tom Gundersen <teg at jklm.no> wrote:
> On Tue, May 20, 2014 at 11:33 AM, Umut Tezduyar Lindskog
> <umut at tezduyar.com> wrote:
>> Thanks for your thoughts and sorry about the delay.
>>
>> On Thu, May 15, 2014 at 11:47 AM, Tom Gundersen <teg at jklm.no> wrote:
>>> Hi Umut,
>>>
>>> Sorry for digging out an old thread, but it appears it has not yet
>>> been answered.
>>>
>>> On Thu, Apr 24, 2014 at 11:15 AM, Umut Tezduyar Lindskog
>>> <umut at tezduyar.com> wrote:
>>>> We are starting many services between basic.target - multi-user.target
>>>> at the same time and due to this we are suffering from following two
>>>> subjects. What can we do to overcome these problems?
>>>>
>>>> 1) We would like to start a subset of services that are scheduled to
>>>> start between basic.target - multi-user.target before the rest and
>>>> there is no built in way to satisfy our needs.
>>>
>>> The reason for this is purely scheduling, right? I looked at this sort
>>> of thing in the past (and noticed that such tweaks could indeed give
>>> quite noticeable performance benefits), however, we discussed this and
>>> I was convinced that we should not try to play such games in systemd,
>>> rather we should let the kernel do the scheduling and possibly provide
>>> it some hints (see below).
>>
>> Yes it is for scheduling. The output of a subset of services is more
>> important than the rest.
>>
>>>
>>>> a) We could use Before=, After= on services but the downside of this
>>>> kind of dependency is we have to edit every single service file with
>>>> Before=, After= directive. This is not the best option when the subset
>>>> of services we would like to start early might change between hardware
>>>> or product configuration.
>>>
>>> That approach would probably work, but I agree it is a hack...
>>>
>>>> b) The ongoing patch
>>>> http://lists.freedesktop.org/archives/systemd-devel/2014-March/018220.html
>>>> is promising but it seems to be stopped. Any reason?
>>>
>>> Looks like the correct approach to me. Not sure what's going on with
>>> it though (if anything).
>>>
>>>> c) A service running before basic.target and queriying systemd with
>>>> "systemctl show -p [Wants Requires] default.target" and adding
>>>> Before=, After= dependency on services on runtime. Doesn't seem so
>>>> efficient.
>>>
>>> Might also work as a temporary hack, but long-term we'd hopefully get b)...
>>>
>>>> 2) Due to starting too many services and due to having frequent
>>>> context switches (flushing of caches), we see that boot time is longer
>>>> than booting services sequentially.
>>>>
>>>> a) Proposing a configuration to limit the number of jobs that are in
>>>> "activating" state.
>>>
>>> Wouldn't thes easily deadlock? Imagine you have two services on your
>>> system A and B. Each of them needs to communicate with the other it
>>> would become fully active. If your limit of active jobs is 2 there is
>>> no problem, but if it is 1 it would always deadlock...
>>
>> I don't think so. If A wants to communicate with B before B is
>> started, then the communication should be via some kind of on deman
>> activation (socket, dbus, etc). We should be able differentiate if the
>> service systemd is trying to start is part of the initial transaction
>> or someone requested systemd to start a service.
>
> You mean that services started on-demand by systemd should be exempt
> from the limit on number of activatable services. So in my scenario
> both would indeed start at the same time, even if I set a limit of 1?

Correct. It should work for your case.

>
> That would only work if systemd is actually aware of the dependency
> (i.e. it is socket actiaveted as you propose, or an entirely new
> StartupDependency attribute is introduced). The services may otherwise
> communicate in some other way (through files or whatnot), which we may
> not be aware of, in which case you get the same problem.

I think definitely some additional information is needed to separate
initial transaction list from later on start requests to systemd. I
can't forecast how much rewrite it would be but in all cases systemd
should know that some kind of state has changed (path is updated, dbus
activate signal is received, epoll on socket etc) and it should act on
the state change (ex, start a new service).

>
> I'd really try to look first at the possibility of instructing the
> kernel to do the right scheduling of the various tasks before starting
> to add new things to systemd here...
>
>> Even when we take in 1-b), we will still have the problem for the
>> remaining of the services that are lower priority by 1-b). I am still
>> proposing something like 2-a). I will try to draw some diagrams to
>> make it more clear.
>
> Wouldn't this be solved by telling the kernel to schedule the starting
> services with high latency (or whatever the terminology is), i.e.,
> give each of them a relatively large timeslice. That would decrease
> the flushing, but at the same time avoid any issues with deadlocks
> etc. It should also give us the flexibility to give some services low
> latency if that is required for them etc (think udev/systemd/dbus and
> otherthings which would otherwise block boot).

This is exactly what the cpu.shares cgroup property does and that is
what the patch posted on ML is trying to utilize. In theory we should
be able to prioritize certain services with the posted patch. But the
frequent context switching problem still remains for non prioritized
services.

I am having another thought about this and I might have something else
here. I am getting inspired by the posted patch and proposing
something like: "StartupCPUShares=*" (or any other symbol)

What StartupCPUShares=* will tell systemd that the service really
doesn't care about it's cpu.shares value. If we have 100 services with
StartupCPUShares=* value, then with combination of some kind of
NumberOfActiveServices value, systemd will adjust cpu.shares of
NumberOfActiveServices until they are activated. Hope it makes sense.

Thoughts?
Umut

>
>>>> We are aware that our problem is mostly embedded platform specific. We
>>>> could solve our problem staticly with what systemd offers but a static
>>>> solution that is customized for one hardware is not the best solution
>>>> for another hardware. Having static solutions per hardware is
>>>> extremely hard to maintain and we would like to solve this problem
>>>> upstream instead of downstream magic.
>>>
>>> I think this sounds universally useful, and it would be cool if we
>>> could get the startup resource logic upstream...
>>
>> We also agree that this is universally useful.
>
> I noticed a new patch was posted. Might be useful to try if that helps
> your setup (I haven't had the chance to look at it yet).
>
> Cheers,
>
> Tom