[systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

Michael Chapman mike at very.puzzling.org
Mon Jul 10 11:51:13 UTC 2017


On Mon, 10 Jul 2017, Lennart Poettering wrote:
> On Sat, 08.07.17 16:24, Michael Chapman (mike at very.puzzling.org) wrote:
>
>> On Sat, 8 Jul 2017, vcaputo at pengaru.com wrote:
>>> In doing some casual journalctl profiling and stracing, it became apparent
>>> that `journalctl -b --no-pager` runs across a significant quantity of logs,
>>> ~10% of the time was thrown away on getpid() calls due to commmit a65f06b.
>>>
>>> As-is:
>>> # time ./journalctl -b --no-pager > /dev/null
>>>
>>> real    0m11.033s
>>> user    0m10.084s
>>> sys     0m0.943s
>>>
>>>
>>> After changing journal_pid_changed() to simply return 1:
>>> # time ./journalctl -b --no-pager > /dev/null
>>>
>>>  real    0m9.641s
>>>  user    0m9.449s
>>>  sys     0m0.191s
>>
>> [...]
>>
>>> As this is public sd-journal API, it's somewhat set in stone.
>>
>> So it's arguable whether making an API work in _more_ situations than it
>> previously did is a "breaking" change.
>>
>> I've tried to go through the history for the various *_pid_changed()
>> functions in the APIs systemd presents, and I'm struggling to find a good
>> justification for them. It seems like it was originally added for sd-bus in:
>>
>>   https://github.com/systemd/systemd/commit/d5a2b9a6f455468a0f29483303657ab4fd7013d8
>>
>> And then other APIs copied it to be consistent with sd-bus:
>>
>>   https://github.com/systemd/systemd/commit/a65f06bb27688a6738f2f94b7f055f4c66768d63
>>   https://github.com/systemd/systemd/commit/eaa3cbef3b8c214cd5c2d75b04e70ad477187e17
>>   https://github.com/systemd/systemd/commit/adf412b9ec7292e0c83aaf9ab93e08c2c8bd524a
>>
>> Unfortunately none of these commits describe what will go wrong if one of
>> these APIs is used across fork. Does anybody know what specifically is the
>> problem being addressed here? Can we detect this problem in some
>> other way?
>
> This all stems from my experiences with PulseAudio back in the day:
> People do not grok the effect of fork(): it only duplicates the
> invoking thread, not any other threads of the process, moreover all
> data structures are copied as they are, and that's a time bomb really:
> consider one of our context objects is being used by one thread at the
> moment another thread invokes fork(): the thread using the object is
> busy making changes to the object, rearranging some datastructure (for
> example, rehashing a hash table, because it hit its fill limit) and
> suchlike. Now the fork() happens while it is doing that: the data
> structure will be copied in its half-written, half-updated status quo,
> and in the child process there's no thread that could finish what has
> been started, and there's neither a way to rollback the changes that
> are in progress.
[...]

Thanks, that really does clear things up.

It's a pity glibc doesn't provide an equivalent for pthread_atfork() 
outside of the pthread library. Having a notification that a fork has just 
occurred would allow us to do the PID caching ourselves.

Of course, there's still a problem with people calling the clone syscall 
directly... but I think once people start doing that we have to trust them 
to know what they're doing.


More information about the systemd-devel mailing list