[systemd-devel] Support for large applications

Wed Feb 17 18:47:33 UTC 2016

On 02/17/2016 03:56 PM, Zbigniew Jędrzejewski-Szmek wrote:
> On Wed, Feb 17, 2016 at 02:35:55PM +0200, Avi Kivity wrote:
>> We are using systemd to supervise our NoSQL database and are
>> generally happy.
>>
>> A few things will help even more:
>>
>> 1. log core dumps immediately rather than after the dump completes
>>
>> A database will often consume all memory on the machine; dumping
>> 120GB can take a lot of time, especially if compression is enabled.
>> As the situation is now, there is a period of time where it is
>> impossible to know what is happening.
>>
>> (I saw that 229 improves core dumps, but did not see this specifically)
> The coredump is logged afterwards because that's the only way to
> include all information (including the compressed file name) in one
> log message.

Maybe we can log two messages if we can detect that the core is very 
large or if we detect that it will take more that a couple of seconds to 
store it.

>   But there are two changes which might mitigate the problem:
> - semi-recently we switched to lz4, which compresses significantly faster,
>    have you tried that?

I think I haven't yet, but consider that memory sizes are growing 
rapidly (e.g. byte-addressable non-volatile memory), core counts are 
large; I don't think improvements in compression can catch up to this.

>
> - recently the responsibility of writing core dumps was split out to
>    a service. I'm not sure how that influences the time when the log
>    message is written.

I'll try it out; may take some time because I don't want to upgrade my 
large machines for F24 yet.

btw I hope that with this change the service is only restarted after the 
dump is complete, or oom is likely.

>
>> 2. parallel compression of core dumps
>>
>> As well as consuming all of memory, we also consume all cpus.  Once
>> we dump core we may as well use those cores for compressing the huge
>> dump.
> This should be implemented in the compression library. The compressor
> does not seem to be threaded, but it was we would try to make use of it.
> OTOH, single-threaded lz4 is able to produce ~500MB/s of compressed
> output, so you'd need a really fast disk to go above that.

I happen to have a really fast disk, reaching 4X that, and this is 
common for database users.

>
>> 3. watchdog during startup
>>
>> Sometimes we need to perform expensive operations during startup
>> (log replay, rebuild from network replica) before we can start
>> serving. Rather than configure a huge start timeout, I'd prefer to
>> have the service report progress to systemd so that it knows that
>> startup is still in progress.
> Zbyszek